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A ripe time for gaining ground 


After three years of heated debate, the advocates and critics of gain-of-function research must 


work to agree on how best to regulate the work. 


Policy and the US Department of Health and Human Services 
announced an immediate pause in all new government funding 
for gain-of-function (GOF) research — experiments to boost the trans- 
missibility, virulence or host range of pathogens — on influenzas, Mid- 
dle East respiratory syndrome and severe acute respiratory syndrome. 

The pause is to allow time to develop a new policy on how such work 
should be conducted and regulated. The policy will be informed by a 
full assessment of the risks and benefits of the work, and by how these 
compare with those for safer alternatives. In charge of that assess- 
ment is the National Science Advisory Board for Biosecurity (NSABB), 
which is meeting this week for the first time in two years (see page 
411). The board will also advise on the policy's content. In parallel, the 
National Research Council (NRC) of the National Academies will con- 
vene a scientific conference on the issues surrounding GOF research, 
including its risks and benefits. It will also review the NSABB’s draft 
recommendations for the new policy. 

Controversy over GOF research was first sparked in late 2011 when 
the NSABB attempted to stop the publication of the full results of two 
studies in which the H5N1 avian flu virus had been engineered to 
become transmissible in mammals. The board and others were worried 
that information in the papers could help terrorists or other malevolent 
individuals to develop a bioweapon. Those concerns were finally over- 
ruled, and on 24 September, the United States adopted new rules on 
what is known as dual-use research — work that could be misapplied 
to do harm — on 15 pathogens or toxins. 

A wider concern raised at the time — which has since shifted to 
front and centre — was the risk of a pathogen that had been engi- 
neered to become more dangerous escaping from the lab. In Febru- 
ary 2012, the US Department of Health and Human Services added 
another layer of review for grant proposals involving GOF research, 
but only for H5N1; this was extended to H7N9 in August 2013. The 
research community became deeply polarized over the issues sur- 
rounding GOF work. Some vaunted the benefits of such research for 
pandemic preparedness and down-played biosafety and biosecurity 
risks, whereas others argued that the experiments should not be done 
because the risks far outweighed the benefits. To allow time for debate, 
GOF researchers agreed to put their research on hold, resuming work 
a year later after deciding that enough time had passed. 

The decision to implement another moratorium — and to broaden 
it to pathogens other than the H5N1 and H7N9 flu viruses — is a belated 
acknowledgement that the issue of how to handle GOF research is 
far from resolved. And the revelations over the past few months of seri- 
ous violations and accidents at some of the leading biosafety contain- 
ment labs in the United States has burst the hubris that some scientists, 
and their institutions, have in their perceived ability to work safely 
with dangerous pathogens. The US administration cited these 
revelations as one reason for the latest review; behind-the-scenes 


L* last week, the White House Office of Science and Technology 


lobbying by critics of GOF research also played a part. 

The climate for constructive discussion is now perhaps better than it 
was: although opinions remain sharply divided, each side now seems 
to be listening more to the other. In July, almost 300 scientists and pol- 
icy experts signed up to the ‘Cambridge Consensus, which criticized 
the lack of a proper risk—benefit assessment of the research, and called 
for exactly what the US government has now agreed to do. More than 

200 scientists responded with ‘Scientists for 


“The climate Science, which defended GOF research and 
for constructive the ability to carry it out safely, but acqui- 
discussionis now _ escedonthe possibility of further discussion, 
perhaps better as long as it was done “under the auspices 


of a neutral party’, such as the US National 
Academy of Sciences. 

Inasign of the potential for common ground, Ian Lipkin, a renowned 
virus hunter at Columbia University in New York, saw fit to sign both 
calls. Both arguments have merit, he says, but both are also incomplete. 
Last week, a group of scientists including both opponents and support- 
ers of GOF flu research, published a sober assessment of the potential 
and limitations of current approaches to assess the potential pandemic 
risk of various flu viruses (C. A. Russell et al. eLife 3, 03883; 2014). It 
paints a much more nuanced picture than some of the bold claims 
made earlier for GOF research. We need more such balanced analyses, 
and fewer dogmatic opinions, on both sides. m 


thanit was.” 


The ice bucket 


Social-media fun for medical research 
bypasses animal sensitivities. 


Siidhof and Shinya Yamanaka did it. The fashion world’s 

Naomi Campbell and Victoria Beckham did it. Physicist Stephen 
Hawking — who has the disease amyotrophic lateral sclerosis (ALS) 
— watched as his children did it on his behalf. They, perhaps you, and 
millions of others all took the ‘ice-bucket challenge’ 

Even if the name is unfamiliar, the images are unlikely to be. The 
challenge involved being filmed as you had a bucket of iced water 
thrown on you. For the privilege, most people pledged money for 
research into ALS, also known as motor-neuron disease, and then 
nominated others to take the challenge. The resulting little movies were 
posted on the Internet. It was a lot of fun. 

As many of the people who took the challenge understand, ALS is a 
dreadful illness. Motor neurons in the brain and spinal cord degenerate 


issn Steven Spielberg did it. Nobel laureates Thomas 
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and lead to paralysis. It is relatively rare, affecting 4-7 in 100,000 people. 
But there is no cure, and no good understanding of its cause. 

The ice-bucket challenge emerged in the United States in July and 
went viral around the globe, peaking in August. During that month, the 
ALS Association in Washington DC received more than US$100 mil- 
lion in donations, compared with $2.8 million collected during August 
2013. Already, the association has distributed some $20 million of that 
for research. ALS societies in Germany and the Netherlands hauled in 
more than $1 million each. Australia managed more than $2 million and 
Japan more than a quarter ofa million. The UK Motor Neuron Disease 
(MND) Association in Northampton attracted 910,000 donations in 
just three August weeks, compared with its average monthly score of 
13,000. Research has never benefited from a social-media phenomenon 
to this extent before. 

The success of the activity is an endorsement of medical research by 
the general public. The associations that benefited have been careful 
to explain that the money will be distributed through expert review. 
This means that only the best research will be funded. Yet during all the 
excitement, what mention was made of the fact that research leading 
to effective treatments will eventually, one way or another, require the 
use of animals? 

The research collaborations chosen on 2 October in the ALS Asso- 
ciation’s first round of funding are mostly based on human genomic 
and stem-cell approaches, which tactfully avoids the animal issue. By 
contrast, beneficiaries of the MND Association’s windfall include both 
clinical research and research that uses animal models. ALS is a disease 
that can be caused by different factors in different people. Because its 
aetiology is so poorly understood, the animal models generated so far 
— in, for example, flies, mice and monkeys — are not totally reliable. 
Much will be gained from the human-genetic approaches now under 
way. They could help to develop better animal models. 

Would members of the public have participated so joyously in the 


activity if they had known that research on animals might benefit from 
their donations? Had that sensitive question been raised, the mood 
might have been different and its consequences for medical research 
damaged. But glossing over the reality of such research is not a good 
strategy for avoiding crises; instead, life scientists and their organiza- 
tions should take every opportunity to say when animals have been 
used in research, and to explain why. Societal discussions about respon- 
sible animal research need to take place outside periods of crisis. 

It is encouraging to see the tide slowly turning towards such openness 
— witness the MND Association’s upfront funding of the full spectrum 
of necessary research. And outside the ice-bucket excitement, last week 

saw another major advance. On 13 October, 


“There are the US Society for Neuroscience and the 
many ways to Federation of European Neuroscience Soci- 
support medical _ eties combined their might to publish, for the 


first time, a public statement in support ofa 
neuroscientist under attack: Nikos Logothetis, 
a director at the Max Planck Institute for Biological Cybernetics in 
Tubingen, Germany, who works with monkeys. His lab had been infil- 
trated by an animal activist who filmed the primates there, and the videos 
were used as propaganda by organizations opposed to any research on 
animals. (An independent investigation at the institute declared that 
there were no systematic problems with animal care there.) 

This sort of vocal support for research is important. Logothetis’s work 
on the brain is fundamental, but applied research on degenerative dis- 
eases, including ALS, will be aided by a better understanding of the 
complex organ in which the diseases originate. 

There are many ways to support medical research. Engaging people's 
enthusiasm with actions such as the ice-bucket challenge is an impor- 
tant one. Public support by scientific organizations for the responsible 
actions of their members is another. The challenge is great, the need 
even greater. m 


research.” 


Toxic influence 


Europe must act to stop livestock drugs from 
wiping out its vulture populations. 


tions, because a drug that has killed hundreds of thousands 

of birds and driven some species to the brink of extinction in 
Asia now threatens to do the same in Europe. The European Medicines 
Agency (EMA) must clamp down on the drug. 

The Spanish bird died two years ago. Now, the probable cause has 
been identified as a drug given to livestock (I. Zorrilla et al. Conserv. 
Biol. http://doi.org/wf5; 2014). Events in Asia show how serious the 
consequences could be. In the 1990s, vultures on the Indian sub- 
continent started dying in huge numbers. Some populations lost more 
than 95% of their animals. The consequences were catastrophic. As 
the skies cleared, dead livestock were left to rot in fields. 

Research finally pinned the blame on the anti-inflammatory drug 
diclofenac, which had become widely used in cattle for problems rang- 
ing from pneumonia to mastitis. Although harmless to bovines, it 
is highly toxic to vultures that feed on the carcasses (J. L. Oaks et al. 
Nature 427, 630-633; 2004). 

As a result, India, Pakistan and Nepal placed heavy restrictions on 
the use of the drug in livestock. And although campaigners say that 
large vials officially designated for human use are often repurposed by 
veterinarians, the threat to the vultures of Asia has decreased. Num- 
bers have not yet recovered, and in some cases are still declining, but 
the birds at least now stand a chance. 

Europe is heading in the opposite direction. Despite warnings 


A dead vulture in Spain could herald a crisis for raptor popula- 
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from scientists, Spain — home to the vast majority of Europe's 
vultures — last year licensed diclofenac for livestock use. The EMA is 
considering the risks posed by the drug, and is scheduled to reach a 
decision by the end of November. 

The discovery that the 2012 vulture was probably felled by a related 
drug, called flunixin (see Nature http://doi.org/wfx; 2014), is wor- 
rying for two reasons. First, it shows that diclofenac is not the only 
product in the class known as non-steroidal anti-inflammatory drugs 
(NSAIDs) that has the potential to kill vultures and other birds of prey. 
Second, it shows that carcasses containing significant quantities of 
these drugs are reaching the wild-animal food chain in Europe — in 
this case, probably through the Spanish tradition of wild-animal feed- 
ing stations known as muladares. 

Two things should now happen. The EMA must move to heavily 
restrict — if not ban — the use of diclofenac in livestock. An alternative 
drug that does not harm vultures — meloxicam — is already available, 
and vets should use this in preference. And, as urged by the research- 
ers who reported the flunixin-killed vulture, regulators should look 
at the effects of all NSAIDs used in livestock on vultures. Although 
diclofenac could well be the most deadly, we must know what other 
drugs also pose a threat to birds that feast on carrion, and how they 
might be managed. 

In the longer term, regulators in Spain and the rest of the European 
Union need to ask how a drug with such evidence of environmental 
damage was allowed to come onto the market. 

Spain is an important stronghold for vultures, and this alone would 
be reason enough to look seriously at restricting the use of diclofenac. 
But the European Union needs to set an example 
for the rest of the world. If it allows diclofenac 
use to continue, countries such as India could 
well decide to ease their restrictions, and African 
nations may rethink their plans to ban it. m 
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TOM FINCH 


WORLD VIEW pevnisicor sen 


Europe’s western seaboard. Seabirds struggle to feed in rough 

water, and some 40,000 of them soon washed up dead on beaches. 
Climate change is expected to increase the frequency of such storms, 
so to understand the impact of global warming on ecosystems, we 
need to analyse the long-term biological impact of these events. 

Until recently, I was in an excellent position to do this. For more 
than 40 years, I have studied populations of guillemots on Skomer 
Island, off the coast of Wales. My research has revealed, for example, 
that the birds now breed two weeks earlier than they did in the 1970s, 
probably owing to climate change. 

This kind of research is not easy. It has taken four decades to 
accumulate the data necessary to understand how the population 
works because to do so requires accurate 
measures of how long adult guillemots live, 
how many chicks they produce, how old they 
are when they breed, what proportion of young 
birds survive to breed and so on. 

No more. Funding for the project has been 
axed. As it stands, I have no money to pay a 
research assistant to help me identify and count 
exactly how many of the birds have managed to 
survive the storms. 

To assess the storms effects, we need to gather 
data from the 2015 breeding season to feed into 
the statistical models we use to calculate sur- 
vival. It is frustrating that officials chose this 
moment to terminate our funding, when we 
have such an important opportunity to assess 
the vulnerability of seabirds to climate change. 

Guillemots are one of our most abundant 
seabirds, and they are excellent indicators of the quality of the marine 
environment. For example, they are desperately vulnerable to oil 
pollution, and tens of thousands have died in oil spills such as those 
resulting from the sinking of the Torrey Canyon (1967) and Erika 
(1999) oil tankers. Partly as a consequence of such disasters, guillemot 
numbers have fluctuated widely over the past 80 years. 

In the 1930s, Skomer’s guillemot population stood at around 
100,000 pairs. By 1972, when I started to work with them, the num- 
bers had fallen to just 2,000 pairs, probably owing to oil spills from 
ships sunk nearby during the Second World War. Since the 1980s, 
the numbers have increased, and there are now around 25,000 pairs. 

For the past 20 years, this study — the aims of which are to understand 
the population biology of guillemots and to implementa scientifically 
robust monitoring scheme — was funded by the 


E: the early months of this year, a series of fierce storms battered 


Countryside Council for Wales. But in 2013, the NATURE.COM 
council was consumed by a new quango, Natural _ Discuss this article 
Resources Wales (NRW), which terminated the _ online at: 


funding of about £12,000 (US$19,000) per year. —_go.nature.com/pn9nwj 
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FEELING 
QUT THERE THAT 
CONSERVATION 

AND MONITORING IS 


LOW-QUALITY 


SCIENCE AND 
SHOULD BE CHEAP. 


Stormy outlook for 
long-term ecology studies 


The closure of a 40-year project to understand and protect seabirds shows 
the false priorities of funders, warns Tim Birkhead. 


NRW implied there was a shortage of cash, but I think the move 
was down to a change in priorities. NRW does not seem to value what 
my study has achieved: a comprehensive health check for guillemots. 
There is a feeling out there that conservation and monitoring is low- 
quality science and should be cheap; there is also a feeling that moni- 
toring does not matter. 

For all those biologists who start what turn out to be long-term 
studies, continuity of funding is a major problem. Most research 
grants are for 3-5 years, and in the current economic climate it is hard 
to predict whether funding will be renewed. Of course, all research- 
ers dream of continuous funding, but long-term ecological studies 
are a special case. They are often disproportionately successful in 
terms of new discoveries because researchers know their system or 
study species extremely well and under various 
environmental conditions. 

Long-term population studies have shown, 
for example, that unlike humans, female chim- 
panzees do not experience a menopause. They 
have revealed that the age at which mute swans 
start and stop reproducing is a heritable trait. 
And they have demonstrated how rare environ- 
mental events — such as total food failure in 
one year — can turn cooperative, peaceful birds 
into selfish, brutal killers of their neighbours’ 
offspring. 

The current focus by the main funding bodies 
on what they consider economically useful 
research with a quick return is short-sighted. 
When my study started in the 1970s, climate 
change was barely on anyone's radar. The main 
benefit of long-term studies is that they allow 
researchers to address problems that no one has yet imagined. If we 
are to have any hope of conserving species, we need to understand 
them, and we need to understand the way they are affected by envi- 
ronmental change. 

Back in 1972, the aim of my original PhD project, supervised by 
Chris Perrins and the late David Lack, was to understand the dynam- 
ics of the declining population of guillemots on Skomer. Lack was 
famous for his work on the population biology of birds, an interest 
that was encapsulated in one of his best-known books, The Natural 
Regulation of Animal Numbers (1954). Quite what he thought I could 
achieve in a three-year PhD is still a mystery to me, given that guil- 
lemots live for at least 20 years and do not start breeding until they are 
at least five years old. 

Forty years on, Perrins asked me whether I would soon be 
completing the project he set me. I would dearly like to. m 


Tim Birkhead is professor of zoology at the University of Sheffield, UK. 
e-mail: t.r. birkhead@sheffield.ac.uk 
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RESEARCH HIGHLIGHT 


Obesity link to jet- 
lagged microbes 


Disrupted sleep patterns alter 
the composition of gut bacteria, 
leading to metabolic problems. 

Eran Elinav at the Weizmann 
Institute of Science in Rehovot, 
Israel, and his team found that 
the abundance of gut microbes 
in mice fluctuates daily in syne 
with host feeding times. But 
when the team genetically 
disabled the animals’ circadian 
clocks or shifted them by 
eight hours, the bacteria lost 
this rhythmicity and their 
composition changed. 

Jet-lagged mice eating a 
high-fat diet gained more 
weight and showed an 
increased susceptibility to 
diabetes compared with 
normal mice that were fed the 
same food. Jet-lagged humans 
had more bacteria called 
Firmicutes — which have been 
linked to metabolic disease — 
in their guts than before their 
transatlantic trips. 

The findings could explain 
why shift workers have a higher 
risk of obesity and diabetes. 
Cell http://doi.org/wfh (2014) 


CONSERVATION 


Horn trade could 
save rhinos 


Wild southern white 
rhinoceroses could go extinct 
in just nine years because of 
poaching, but could be saved if 
trade in their horns were to be 
carefully managed. 

Poachers killed almost 1,000 


Selections from the 
scientific literature 


SOLAR PHYSICS 


Solar atmosphere is a hotbed of activity 


Explosions of plasma in the Sun’s atmosphere can 


reach temperatures of nearly 100,000°C, much 
hotter than scientists had expected. 

The finding is one of several about the 
region between the solar surface and the 
uppermost edge of the Sun’s atmosphere, or 
corona, revealed by NASA’ Interface Region 
Imaging Spectrograph (IRIS) mission. The 
spacecraft (pictured before its launch) found 
that much of the energy from solar flares 
goes into heating and accelerating the plasma 
explosions, reports a team led by Hardi Peter 
of the Max Planck Institute for Solar System 


southern white rhinoceroses 
(Ceratotherium simum simum; 
pictured) for their horns in 
2013, some 5% of the total 
population. Enrico Di Minin of 
the University of Helsinki and 
his colleagues used population 
and economic models to 
estimate extinction risk and the 
cost of anti-poaching patrols. 
The models suggest that 
the species could be saved by 
a carefully controlled trade in 
horn collected from rhinos 
that die naturally or harvested 
from live animals without 
killing them. Money from this 
would fund increased anti- 
poaching patrols and create an 


clinical trials. 
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income source for local people, 
deterring them from poaching. 
Conserv. Biol. http://dx.doi. 
org/10.1111/cobi.12412 (2014) 


STEM CELLS 


Cell transplants 
enhance vision 


Implanted retinal cells derived 
from stem cells seem to be 
improving vision in some 
people in two early-stage 


Steven Schwartz at the 
University of California, Los 
Angeles, Robert Lanza at 
Advanced Cell Technology in 


Research in Géttingen, Germany. 

Viggo Hansteen of the University of Oslo and 
his co-workers found short loops of magnetized 
plasma that flicker out within minutes and could 
help to explain how the corona gets so hot. 

Jets of charged particles less than 
300 kilometres wide also occasionally appear 
for up to 80 seconds, and may fuel the solar 
wind, say Hui Tian of the Harvard-Smithsonian 
Center for Astrophysics in Cambridge, 
Massachusetts, and his colleagues. 

Science http://doi.org/wfc; http://doi.org/wfd; 
http://doi.org/wff (2014) 


Marlborough, Massachusetts, 
and their team grew retinal 
pigmented epithelial cells from 
human embryonic stem cells 
and transplanted them into the 
retinas of 18 people who have 
one of two forms of macular 
degeneration, which results in 
the loss of central vision. 

After about two years, there 
have been no serious side 
effects from the cells, such 
as abnormal growth. Ten 
participants reported seeing 
more letters on an eye chart 
than before the treatment. 

The transplanted cells 
are support cells that do not 
directly enable vision, so it is 
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not known how eyesight has 
improved. The authors could 
not rule out placebo and other 
bias effects. 

Lancet http://doi.org/wdf (2014) 


PHOTONICS 


Laser moves items 
long distances 


A laser beam can move matter 
tens of centimetres and in two 
directions. 

Such tractor beams have 
been used to shift small objects 
very short distances. To scale 
this up, Wieslaw Krolikowski 
at the Australian National 
University in Canberra and his 
team fired a laser beam at gold- 
coated hollow glass spheres in 
air. The photons heated up the 
spheres, creating a temperature 
difference between their far and 
near surfaces. This generated 
a force that pushed the shells 
in the opposite direction to the 
beam. By changing the beam’s 
polarization state, the team 
was able to stop the spheres or 
reverse their direction. 

The authors say that the 
technique could be used to 
gather samples remotely and 
for other applications. 

Nature Photonics http://doi.org/ 
wft (2014) 


METEOROLOGY 


Tornadoes growing 
more clustered 


Tornadoes in the United States 
have been happening on fewer 
days since the 1970s, but more 
tornadoes have touched down 
(pictured) on those days. 

The overall number of US 
tornadoes has not changed in 
recent decades. However, in 


analysing the national tornado 
database, Harold Brooks of 
the National Severe Storms 
Laboratory in Norman, 
Oklahoma, and his team found 
that the number of days with 
at least one tornado has fallen 
from 150 to 100 since the early 
1970s. Over the past decade, 
20% of US tornadoes occurred 
on just three days of the year. 
Whether the change is linked 
to rising global temperatures is 
not clear, the authors say. 
Science 346, 349-352 (2014) 


Molecule boosts 
brain rewiring 


Blocking a brain-cell receptor 
boosts the brain's ability to form 
new neuronal connections as it 
adapts to changing stimuli. 

Carla Shatz at Stanford 
University in California and 
her colleagues disrupted the 
receptor, PirB, in the visual 
centre of mouse brains by 
either genetically deleting it or 
blocking it with a molecule. 

They found that when these 
mice were forced to use only 
one eye, circuits in their visual 
cortices were able to rewire 
better than those of normal 
mice. This happened even 
in adulthood, when brain- 
cell rewiring becomes more 
difficult. Ina mouse model 
of amblyopia, or ‘lazy eye; the 
blocking molecule made the 
brain sensitive to signals from 
the unused eye, allowing better 
vision in that eye. 

Targeting PirB could bea 
way to treat amblyopia and 
other brain disorders, the 
authors say. 

Sci. Transl. Med. 6, 258ra140 
(2014) 


Strange fossil is a 
vertebrate cousin 


Bizarre 500-million-year- 
old sea creatures called 
vetulicolians are relatives of 
vertebrates. 
Palaeontologists have 
struggled to identify the 
relationship between living 
animals and these extinct 
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SOCIAL SELECTION 


Popular articles 
on social media 


Pros and cons of the PhD glut 


Amid increased competition for faculty jobs in biomedicine, 
some have suggested cutting the number of PhD students. So 
when a senior scientist advised against this, the online world 
took notice. Eve Marder, a neuroscientist at Brandeis University 
in Waltham, Massachusetts, argued in the journal eLife that 

it is hard to predict who will excel in science, so any attempt 

to limit access to PhD programmes will inevitably exclude 
potential stars. The reaction was mixed. “Reduce the number 
of admitted graduate students? Agree with Eve Marder: not the 
greatest idea; tweeted Sergey Kryazhimskiy, an evolutionary 
biologist at Harvard University in Cambridge, Massachusetts. 
But Mike White, a geneticist at Washington University School 
of Medicine in St. Louis, Missouri, argued in a blog post that 
Marder was “perpetuating the PhD pyramid scheme”. 


eLife 3,e04901 (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 


Nature Publishing Group. 


organisms, because of 

their odd combination of 
features such as gill slits anda 
segmented abdomen. A team 
led by Diego Garcia-Bellido at 
the University of Adelaide and 
John Paterson at the University 
of New England in Armidale, 
both in Australia, analysed a 
fossil vetulicolian from a South 
Australian island. 


The fossil, a new species 


named Nesonektris aldridgei, 
shows the outline ofa 
notochord — a rod-like 
structure that develops into 
the backbone in vertebrates. 


Although N. aldridgei is 


distantly related to vertebrates, 
its closest relatives are 
tunicates — invertebrates that 
swim or attach themselves 

to underwater rocks. It was 
probably a free-swimming 
filter-feeder, say the authors. 
BMC Evol. Biol. 14, 214 (2014) 


| CANCER 
Immunotherapy 
beats leukaemia 


Engineering certain immune 
cells to kill cancerous cells 

in leukaemia has driven the 
disease into remission for up 


NATURE.COM 
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popular papers: 
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to two years in more than half 
of participants in an early- 
stage clinical trial. 

Stephan Grupp at the 
Children’s Hospital of 
Philadelphia, Pennsylvania, 
and his co-workers tested 
their approach on 30 people 
with acute lymphoblastic 
leukaemia, including 
25 children, who had failed 
to respond to conventional 
treatment or relapsed. 

The team engineered a 
patient’s T cells to express 
a receptor that targets the 
cancerous B cell, and infused 
the T cells back into the person. 
After one month, 27 people 
were in remission, and after up 
to 2 years, 78% survived — a 
much higher rate than with 
chemotherapy. Those in 
remission had high blood 
levels of the engineered T cells. 

However, all of the 
participants had inflammatory 
side effects that required 
hospitalization. 

N. Engl. J. Med. 371, 1507-1517 
(2014) 
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SEVEN DAYS nescnni 


Pe RESEARCH 
Beyond Pluto 


Planetary researchers have 
identified three possible 
targets for NASA’s New 
Horizons spacecraft after 

it flies past Pluto next year. 
Mission scientists used the 
Hubble Space Telescope to 
search the Kuiper belt in the 
distant reaches of the Solar 
System for objects that the 
probe could reach with its 
limited amount of fuel. Of 
the three, the best candidate, 
dubbed PT1, is several tens 
of kilometres across. If NASA 
approves funding for the visit, 
New Horizons would fly by 
PT1 in late 2018 or early 2019. 


EVENTS 


Quake trial appeal 
On 18 October, an Italian 
court heard opening 
arguments from defence 
attorneys appealing against 
the conviction of six scientists 
and a government official 
found guilty of manslaughter 
over the earthquake that hit 
LAquila, Italy, on 6 April 
2009. On 31 March 2009, the 
scientists had been part of an 
expert panel on seismic risk 
that prosecutors say resulted 
in a reassuring message 
being released to the public. 
According to the charges, that 
message led to the death of 
LAquila residents who stayed 
in their homes when they 
would otherwise have left. 
See go.nature.com/xf4smu 
for more. 


Ebola update 
AUS$1-billion appeal by 

the United Nations to help 
fight Ebola in West Africa has 
yielded only about $376 million 
in pledges, the organization 
said on 16 October. Last week, 
the World Health Organization 
(WHO) estimated that 
mortality rates had neared 
70%, and warned that Liberia, 


Snail find ruffles feathers 


The rediscovery of a colourful snail (pictured), 
declared extinct in 2007, on Aldabra atoll in the 
Seychelles Islands has revived tensions over the 
study published in Biology Letters that reported 
the animal's disappearance (J. Gerlach Biol. Lett. 
3, 581-585; 2007). At the time, biologist Clive 
Hambler of the University of Oxford, UK, and 
his colleagues penned a comment that criticized 
the study’s methods, contested the claims of 
extinction and requested a retraction. But the 
comment was rejected for publication by the 


journal, and the Intergovernmental Panel on 
Climate Change has since cited the snail as a 
prime example of species loss caused by climate 
change. Conservation workers, however, 
rediscovered the Aldabra banded snails 
(Rhachistia aldabrae) in a remote part of the 
atoll in August 2014. On 15 October, editor-in- 
chief Richard Battarbee said in an editorial that 
the journal had invited Hambler to resubmit 
his comment, but that the invitation had been 
declined. 


Sierra Leone and Guinea could 
see up to 10,000 new cases 

per week by December. The 
WHO also declared Nigeria 
and Senegal free of Ebola 

after 42 days had passed since 
their last reported case. In 

the United States, President 
Barack Obama on 17 October 
appointed former White House 
adviser Ron Klain to oversee 
the country’s response to Ebola. 


Plea offer 


Prosecutors have offered a plea 
deal to Dong Pyou Han that 
could resolve his fraud case 
without a trial. The former 
Iowa State University scientist 
is accused of misconduct in 
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an AIDS-vaccine study. In 
court documents filed on 

16 October, Han’s defence 
team asked for more time to 
consider the terms, which must 
be translated into Korean, and 
requested postponement of 
the trial, set for 3 November. In 
December 2013, the US Office 
of Research Integrity reported 
finding that Han spiked rabbit 
blood samples with antibodies 
to produce false results that 
were widely disseminated. 


Costly case 

The University of California, 
Los Angeles, spent almost 
US$4.5 million on defending 
chemist Patrick Harran against 
criminal charges stemming 
from a research assistant’s 


death in a 2008 fire in his 

lab, according to documents 
revealed by the Los Angeles 
Times on 16 October. The 
publicly funded university, 
which has since spent more 
than $20 million on improving 
lab safety, says that taking on 
the legal expenses was within 
its rights and obligations. 

In June, Harran agreed to 

a deal that could result in 
charges being dismissed (see 
go.nature.com/vpdwvz). 


POLICY 


Pathogen pause 

The US White House Office of 
Science and Technology Policy 
on 17 October announced 
amandatory moratorium 
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on new funding for gain- 
of-function research, which 
increases the deadliness 

of pathogens such as the 

virus that causes influenza. 

The moratorium suspends 
government funding for such 
work, and asks scientists who 
have already been funded to 
pause their research voluntarily 
while two non-regulatory 
bodies — the National Science 
Advisory Board for Biosecurity 
(NSABB) and the National 
Research Council — assess the 
risks. See pages 403 and 411 

for more. 


Australian edge 


The Australian government 
unveiled a national strategy on 
14 October for strengthening 
ties between industry and 
science, and increasing 
commercial returns on 

public research funding. The 
‘competitiveness agenda 
established the Commonwealth 
Science Council, a group 

of ten appointed experts 

from business and research, 
which will become the 
leading body for advising 

the government on science 
and technology. Five centres 
will be set up, at a projected 
cost of Aus$188.5 million 
(US$165 million), to 

improve public-private 
science collaborations, and 

to boost Australia’s mining, 
oil and medical technologies 
industries. See go.nature.com/ 


ydbkdd for more. 


TREND WATCH 


An analysis by Google suggests 


that, compared with 1995, a 


greater proportion of the 1,000 


top-cited papers now appear in 
publications other than the ten 
top-cited journals (A. Acharya 

et al. Preprint at http://arxiv.org/ 
abs/1410.2217; 2014). Phil Davis, 
a publishing consultant in Ithaca, 
New York, says that the trend 
could be partly due to an ever- 
increasing number of articles 
being published coupled with a 
decline in the number of articles 
being published in elite journals 
(see go.nature.com/qbztbe). 


Nuclear waste site 
Ina long-delayed safety 
evaluation, the US Nuclear 
Regulatory Commission said 
on 16 October that designs 

for a proposed national 
nuclear-waste disposal site at 
Yucca Mountain in Nevada 
(pictured) contained the 
necessary barriers for long- 
term isolation of radioactive 
waste. Congress designated the 
site as a potential repository 
in 1987, and in 2008 the 
Department of Energy applied 
for a construction licence. But 
the project has been stalled for 
years by political opposition 
and legal challenges (see 
Nature 473, 266-267; 2011). 
Last week's report spurred 
renewed calls by Republicans 
to revive the project. 


| __BUSINESS 
Fusion dreams 


Defence contractor Lockheed 
Martin announced ambitious 
plans for nuclear fusion on 


15 October, with a ten-year 
road map to commercialize 

a reactor small enough to fit 

on alorry. Lockheed’s Skunk 
Works division in Palmdale, 
California, has so far been 
tight-lipped about the results of 
its years of research, and several 
outside experts have responded 
to the company’s lofty goals 
with scepticism. See go.nature. 
com/u597zk for more. 


Buyout backdown 


The biopharmaceutical 
company AbbVie seems 

to be backing away froma 
US$54-billion bid to take over 
Shire, a drug manufacturer 
based in Dublin. On 

15 October, AbbVie’s board 
of directors recommended 
that stockholders in the firm, 
based in North Chicago, 
Illinois, reject the deal in the 
event of a vote. The company 
cited recent changes to US tax 
rules that eliminated some 

of the anticipated financial 
benefits of the transaction. If 
the deal does not go through 
by 14 December, AbbVie 

will pay Shire a $1.6-billion 
break fee. 


Genomics boost 
Genetic-analysis company 
Illumina announced on 

15 October the selection of 
three start-up firms to help 
to develop biotechnology 
applications for next- 
generation sequencing. 
Encoded Genomics, 


ELITE JOURNALS LOSING DOMINANCE? 


Most of a field’s top-cited articles are still published in its top-cited 
journals, but the proportion has declined since 1995. 


m 1995 m Published in top-ten journal 


Proportion of 1,000 top-cited articles (%) 


Chemical Life and Health and 
medical 
sciences sciences sciences 


and material Earth 


™2013 ™ Published in other journal 


All Physics Computer — Social 
articles and maths science 


sciences 


SEVEN DAYS | THIS WEEK | 


27-31 October 

The Intergovernmental 
Panel on Climate 
Change meets in 
Copenhagen to adopta 
report that synthesizes 
findings of its Fifth 
Assessment Report, and 
to approve the report’s 
summary for policy- 
makers. 


29-31 October 

In Aix-en-Provence, 
France, the 3rd 
International 
Conference on 
Biodiversity and 

the UN Millennium 
Development Goals 
tackles sustainability 
and food security. 
go.nature.com/e8snpu 


EpiBiome and Xcell 
Biosciences are the first 
participants in the Illumina 
Accelerator programme, 
which will give each company 
a US$100,000 loan; lab and 
office space in San Francisco, 
California; and extra support 
for experiments and reagents. 
See go.nature.com/tynw7o 
for more. 


Desertec dries up 


Following the withdrawal of 
most of its shareholders, the 
Desertec Industrial Initiative 
(Dii) has buried ambitious 
plans to supply Europe with 
power from solar plants and 
other renewable sources 
across North Africa and the 
Middle East. Only three of 
its existing 19 shareholders 
remained after a meeting last 
week in Rome. Saudi Arabia’s 
ACWA Power, Germany’s 
RWE and the State Grid 
Corporation of China plan 
to remodel Dii into a service 
company for facilitating 
regional renewable-energy 
projects in North Africa and 
the Middle East. 


> NATURE.COM 
For daily news updates see: 
WwW.nature.com/news 
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Outbreaks of influenza have prompted research into strains with pandemic potential. 


BIOSECURITY 


US suspends risky 
disease research 


Government to cease funding gain-of-function studies that 
make viruses more dangerous, pending a safety assessment. 


BY SARA REARDON 


he US government surprised many 
"[Fesercers on 17 October when it 

announced that it will temporarily stop 
funding new research that makes certain viruses 
more deadly or transmissible. The White House 
Office of Science and Technology Policy is also 
asking researchers who conduct such ‘gain-of- 
function’ experiments on influenza, severe acute 


respiratory syndrome (SARS) and Middle East 
respiratory syndrome (MERS) to stop their 
work until a risk assessment is completed — 
leaving many unsure of how to proceed. 

“T think it’s really excellent news,” says Marc 
Lipsitch, an epidemiologist at the Harvard 
School of Public Health in Boston, Massachu- 
setts, who has long called for more oversight 
for gain-of-function research. “I think it’s com- 
mon sense to deliberate before you act” 
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Critics of such work argue that it is unneces- 
sarily dangerous and risks accidentally releas- 
ing viruses with pandemic potential — such as 
an engineered H5N1 influenza virus that eas- 
ily spreads between ferrets breathing the same 
air’. In 2012, such concerns prompted a global 
group of flu researchers to halt gain-of-function 
experiments for a year (see Nature http://doi. 
org/wgx; 2012). The debate reignited in July, 
after a series of lab accidents involving mishan- 
dled pathogens at the US Centers for Disease 
Control and Prevention in Atlanta, Georgia. 

The White House’ abrupt move seems to bea 
response to renewed lobbying by gain-of-func- 
tion critics who wanted such work suspended 
and others who sought to evaluate its risks and 
benefits without disrupting existing research. 

Arturo Casadevall, a microbiologist at the 
Albert Einstein College of Medicine in New 
York City, calls the plan “a knee-jerk reaction”. 
“There is really no evidence that these experi- 
ments are in fact such high risk,” he says. “A 
lot of them are being done by very respectable 
labs, with lots of precautions in place” 

Some researchers are confused by the mora- 
torium’s wording. Viruses are always mutating, 
and Casadevall says that it is difficult to deter- 
mine how much mutation deliberately created 
by scientists might be “reasonably anticipated” 
to make a virus more dangerous — the point 
at which the White House states research must 
stop. The government says that this point will be 
determined for individual grants in discussions 
between funding officers and researchers. 

One of the most prominent laboratories 
conducting gain-of-function studies is run by 
Yoshihiro Kawaoka, a flu researcher at the Uni- 
versity of Wisconsin—Madison. In 2012, Kawa- 
oka published a controversial paper’ reporting 
airborne transmission of engineered H5N1 flu 
between ferrets. He has since created an H1N1 
flu virus using genes similar to those from the 
1918 pandemic strain’, to show how such a 
dangerous flu could emerge. The engineered 
H1NI1 was transmissible in mammals and 
much more harmful than the natural strain. 

Kawaoka says that he plans to comply with 
the White House directive to halt current 
research once he understands which of his 
projects it affects. “I hope that the issues can 
be discussed openly and constructively so that 
important research will not be delayed indefi- 
nitely,’ he says. 

But it seems that the freeze could be lengthy. 
The White House says that it will wait for > 
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> recommendations from the US National 
Science Advisory Board for Biosecurity 
(NSABB) and the National Research Council 
before deciding whether and how to lift the 
ban. The groups are expected to finish their 
work within a year. As Nature went to press, the 
NSABB was set to convene on 22 October, its 
first meeting in two years. Lipsitch, who will 
speak at the event, says that he will advocate for 
the development of an objective risk-assessment 
tool to evaluate individual research projects. In 
particular, he says, decision-makers should con- 
sider whether a gain-of-function study makes 
a contribution to a public-health goal, such as 
the prevention and treatment of flu, that could 


justify both the risk and the use of money that 
could be spent on safer research. 

“There clearly are going to be instances 
where gain-of-function research is necessary 
and appropriate, and there are others where the 
opposite applies,” says Ian Lipkin, a virologist 
at Columbia University in New York City. The 
need to understand the ongoing Ebola out- 
break in West Africa and control its spread, for 
instance, emphasizes the importance of infec- 
tious-disease research — as well as the regula- 
tion of such work, Lipkin says. Although public 
worry about Ebola being transferred through 
the air is unfounded, researchers could make 
a case for the need to determine how the virus 


could evolve in nature by engineering a more 
dangerous version in the lab. “I think we should 
have some sort of guidelines in place before such 
experiments are even proposed,’ says Lipkin. 
Yet Ebola is not included in the White House's 
research-funding ban, anda spokesperson says 
that there are no plans to include it on the list. 
One thing is certain, says Casadevall: the 
NSABB meeting is certain to see heated debate 
as scientists from all sides convene. “I hope they 
have enough room,’ he adds. m SEE EDITORIAL P.403 
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US midterm elections offer 
little hope for science 


November vote is unlikely to break a political stalemate that has squeezed research funding. 


BY LAUREN MORELLO 


hen US voters head to the polls 
on 4 November, they are poised 
to set in motion a major political 


shift that promises to intensify partisan strife 
over issues such as climate change, immigra- 
tion and research funding. For the first time 
since 2006, Republicans are likely to win full 
control of the US Congress — having seized 
the House of Representatives in 2010, they are 
now predicted to take control of the Senate. 
The development seems inauspicious for 
US researchers who depend on government 
funding. Prominent Republicans have repeat- 
edly questioned the veracity of biological evo- 
lution and human-induced climate change, 
and party leaders’ push for drastic spending 
cuts has resulted in across-the-board reduc- 
tions known as sequestration, which slashed 
5.1% from science agencies’ budgets in 2013. 
Yet in fact, the changing balance of power 
is expected to have little practical impact — 
because Congress may not be able to do much 
of anything. Experts see little hope of breaking 
the political gridlock that has made the current 
Congress, which began in January 2013, argu- 
ably the least productive in modern history. 
“Tt doesn’t matter what happens to the Senate,” 
says Michael Lubell, director of public affairs 
for the American Physical Society in Washing- 
ton DC. “The outcome is going to be the same.’ 
Although Republicans already hold a com- 
manding advantage in the House of Represent- 
atives, they are expected to win only a simple 


The deadlocked US Congress has been one of the least productive in modern history. 


majority in the 100-member Senate, not the 
three-fifths majority that generally enables a 
party to pursue its legislative agenda without 
drawing any minority-party support. 

That is a double-edged sword. Congress is 
unlikely to approve large increases in funding. 
But Republicans will have a hard time pushing 
through bills to enact the more extreme parts 
of their agenda — suchas blocking new federal 
regulations to cut carbon emissions, or a plan 
to require the National Science Foundation to 
certify that all of its grants serve the ‘national 
interest: (That proposal, from the House science 
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committee chairman Lamar Smith (Republican, 
Texas) is targeted mostly at funding for research 
in the social and behavioural sciences.) And 
even if such legislation were approved by the 
House and Senate, President Barack Obama 
would almost certainly exercise his veto power. 

Relations are so bad between the two major 
parties that even the US pledge to help stop 
the Ebola outbreak in West Africa has sparked 
bickering. Earlier this month, Senator James 
Inhofe (Republican, Oklahoma) and sev- 
eral high-ranking Republicans in the House 
temporarily blocked Obama’s request for 
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US$750 million to send military personnel 
and other resources to the outbreak zone. 

Funding for many science agencies did 
rise slightly overall in the 2014 fiscal year 
after the mandatory sequestration in 2013. 
The National Oceanic and Atmospheric 
Administration’s budget actually rose by 
roughly $575 million in 2014, to $5.3 billion. 
But the National Institutes of Health (NIH), 
the world’s largest biomedical-research 
funder, received $29.9 billion, less than its 
pre-sequester budget of $30.7 billion. 

“Everybody in Congress knows that 
science is important,’ says Congressman 
Rush Holt (Democrat, New Jersey), “but 
they don't have much appreciation of what it 
takes to sustain it” (Physicist Holt, one of the 
few scientists in US public office, famously 
gives out bumper stickers that read, “My 
Congressman IS a Rocket Scientist!”) 

The NIH cuts have put the squeeze on 
many research institutions. At the Wash- 
ington University School of Medicine in 
St Louis, Missouri, for example, federal grant 
funding fell by 14% from fiscal year 2012 to 
2013, and by another 3% from 2013 to 2014. 
“Part of it was sequestration, part of it was 
the timing of some big grants expiring,” 
says Jennifer Lodge, the university's vice- 
chancellor for research. “It sort of reeks of 
an atmosphere of gloom around funding” 

The story is similar at the University 
of Arizona in Tucson, which relies heav- 
ily on support from the NIH and NASA. 
Although the university has begun seek- 
ing more funding from industry and 
philanthropists, “there's still a sense of 
unease about what the future looks like’, 
says Kimberly Andrews Espy, the univer- 
sity’s senior vice-president for research. 

Others are more optimistic. The Uni- 
versity of Maryland in College Park saw 
its share of federal grants rebound in the 
2013-14 academic year, rising nearly 3% 
on the year before. Its chief research officer 
Patrick O’Shea says that success in chas- 
ing large awards has helped his institution 
to prosper as federal budgets have grown 
tighter. “Rather than complaining, we're 
trying to be more effective,” he says. 

Beyond the coming election, US policy 
analysts are already looking to another, key 
political test: whether lawmakers will finalize 
a 2015 budget deal before they depart for the 
year in late November. Jennifer Zeitzer, dep- 
uty director of public affairs at the Federa- 
tion of American Societies for Experimental 
Biology in Bethesda, Maryland, sees signs of 
a “good-faith effort” to approve a spending 
deal in the next several weeks. 

But even if the lawmakers deliver, she 
says, that will not make up for four years 
of legislative torpor. “This Congress was 
so spectacularly unproductive,’ she says, 
“that even showing up will give the next 
Congress a leg up.’ = 


PALAEOANTHROPOLOGY 


IN FOCUS | NEWS 


Oldest-known human 
genome sequenced 


DNA shows a group of modern humans roamed across Asia. 


BY EWEN CALLAWAY 


45,000-year-old leg bone from Siberia 
A= yielded the oldest genome sequence 

for Homo sapiens on record — reveal- 
ing a mysterious population that may once have 
spanned northern Asia. The DNA sequence 
from a male hunter-gatherer also offers tantali- 
zing clues about modern humans journey from 
Africa to Europe, Asia and beyond, as well as 
their sexual encounters with Neanderthals. 

His kind might have remained unknown 
were it not for Nikolai Peristov, a Russian artist 
who carves jewellery from ancient mammoth 
tusks. In 2008, Peristov was looking for ivory 
along Siberia’s Irtysh River when he noticed a 
bone jutting from the riverbank. He dug it out 
and showed it to a police forensic scientist, who 
identified it as probably human. 

The bone turned out to be a human left 
femur, and eventually made it to the Max 
Planck Institute for Evolutionary Anthropology 
in Leipzig, Germany, where researchers carbon- 
dated it. “It was quite fossilized, and the hope 
was that it might turn out old. We hit the jack- 
pot,’ says Bence Viola, a palaeoanthropologist 
who co-led the study of the remains. “It was 
older than any other modern human yet dated.” 
The luck continued when Viola's colleagues 
found that the bone contained well-preserved 
DNA, and they sequenced its genome to the 
same accuracy as that achieved for contempo- 
rary human genomes (Q. Fu et al. Nature 514, 
445-449; 2014). 

The researchers named their find 
Ust’-Ishim, after the district where 
Peristov found the remains. They dated 
him to between 43,000 and 47,000 years 
old, nearly twice the age of the next- 
oldest known complete modern- 
human genome, although older, 
archaic-human genomes exist. 

DNA may be the only chance to 
connect the remains to other humans. 
“This guy came out of nowhere — 
there's no archaeology site we could 
connect it to,” says Viola, suggesting that 
his group roamed far and wide. 

The Ust’-Ishim man was probably 
descended from an extinct group that 
is closely related to humans who 
left Africa more than 
50,000 years ago to 
populate the rest of the 


The Ust’-Ishim 
femur. 
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world, but later went extinct, Viola says. 

The most intriguing clue about his origin 
is that about 2% of his genome comes from 
Neanderthals. This is roughly the same level 
that lurks in the genomes ofall of today’s non- 
Africans, owing to ancient trysts between their 
ancestors and Neanderthals. The Ust’-Ishim 
man probably got his Neanderthal DNA 
from these same matings, which, past studies 
suggest, happened after the common ances- 
tor of Europeans and Asians left Africa and 
encountered Neanderthals in the Middle East. 

Until now, the timing of this interbreeding 
was uncertain — dated to between 37,000 and 
86,000 years ago. But Neanderthal DNA in the 
Ust’-Ishim genome pinpoints it to between 
50,000 and 60,000 years ago on the basis of 
the long Neanderthal DNA segments in the 
Ust’-Ishim man’s genome. Paternal and mater- 
nal chromosomes are shuffled together in 
each generation, so that over time the DNA 
segments from any individual become shorter. 

The more precise dates for Neanderthal- 
human mating pose a challenge for scientists 
who have proposed that modern humans left 
Africa before 100,000 years ago and reached 
Asia more than 75,000 years ago, says Chris 
Stringer, a palaeoanthropologist at London's 
Natural History Museum. Those researchers, 
who include Michael Petraglia, an archaeologist 
at the University of Oxford, UK, have pointed to 
H. sapiens-like bones from the Levant that are 
older than 100,000 years and to 70,000-year-old 
stone tools found in India as evi- 

dence for an early human exodus 
to Asia along a southern coastal 
route that eventually reached 
Oceania and Australia. 
But Petraglia sees Ust’-Ishim’s 
genome differently. “I think this is part 
of a population boom that’s going on 
around 45,000 years ago, which means 
modern humans got to the ends of the 
world by 45,000 years ago,’ he says. Their 
numbers might have swamped human 
populations that arrived in earlier 
migrations. 

Petraglia expects that ancient DNA 
and other fossil finds will paint a much 
more complicated picture of the peopling 
of Asia. “This is just a random find in 
a Siberian river deposit,’ Stringer says. 
“What else could be there when they start 
looking systematically?” m 
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The Rho Ophiuchi cloud, a star-forming nebula, is being observed by the Kepler space telescope. 


ASTRONOMY 


Sun’s stroke Keeps 
Kepler online 


Space telescope beats mechanical failures to begin a second 
mission that will trace new celestial targets. 


BY MARK ZASTROW 


he crippled Kepler space telescope is 
| unexpectedly enjoying a second lease of 
life. The exoplanet-hunting probe will 
now Cast its gaze on star clusters, the centre 
of the Milky Way and the Solar System’s outer 
planets as it scans a ribbon of the cosmos for 
the next three years. This month it has been 
gazing at gas clouds shrouding infant stars in 
the constellations Scorpius and Ophiuchus. 
The telescope, originally designed to look for 
Earth-like planets orbiting other stars in our 
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region of the Galaxy, has just yielded the first 
data set since its reincarnation after mechanical 
failures. Its science team is still busy analysing 
data from the initial planet-hunting mission, 
so Kepler’s managers at NASA have left it to 
the wider astronomical community to choose 
specific targets for a second mission, known as 
K2, and to comb through the output. 

For a patched-up instrument, it is doing 
remarkably well. Scientists are just getting to 
grips with the first K2 observations, made avail- 
able last month — and they say that the data are 
promising, leaving them eager for more. 
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When Kepler launched in 2009, it sought to 
answer one question: how common are other 
Earths in the Galaxy? The telescope stared at 
roughly 150,000 stars near the constellations 
Cygnus and Lyra, monitoring their brightness 
and watching for a momentary dimming that 
would signify planets crossing in front of them. 
Four years yielded more than 4,000 poten- 
tial planets, hundreds of them Earth-sized 
and rocky. At least one of those bodies was 
found in its star’s habitable zone, where liquid 
water could exist on its surface (see Nature 
http://doi.org/wf4; 2013). 

Crucial to Kepler’s success were its four 
reaction wheels, which held it steady like a 
gyroscope as it circled the Sun. It survived the 
loss of one wheel in July 2012. But when a sec- 
ond failed in May 2013, Kepler was down to 
two wheels, with only two axes of control. 

“Sorrow, disappointment, a little grief in 
there,’ is how John Troeltzsch, Kepler pro- 
gramme manager at Ball Aerospace, says he felt 
when he found out about the second failure. His 
team had built and operated the craft for NASA 
at Ball’s facilities in Boulder, Colorado. Kepler’s 
mission seemed to be over. Then, three days 
later, he opened an e-mail from Ball engineer 
Doug Wiemer, proposing a fix. Five hours and 
eight e-mails later, Wiemer had outlined a plan 
to get Kepler back on its feet. 

Wiemer had fashioned a crutch for Kepler 
using the only resource available: sunlight. 
Positioned so that its long side faces the Sun, 
the spacecraft leans against the pressure cre- 
ated by the onslaught of photons and balances 
using its two good wheels. With this approach, 
the team hoped to get within a factor of ten 
of Kepler’s original performance — but with 
additional software refinements, NASA’s 
Kepler project manager Charlie Sobeck says 
that it is better than that, more like a factor 
of two or three. Wiemer thinks that further 
tweaks will close the gap entirely. 

One limitation of the K2 mission is that 
Kepler must keep the Sun side-on as it orbits, 
forcing the telescope to switch its field of view 
roughly every 80 days. This is not enough time 
to hunt for Earth-like planets around Sun-like 
stars, but it does let K2 track other celestial 
bodies such as clusters of newly-formed stars 
(see ‘Changing focus’). 

In February, Kepler will turn its gaze to the 
well-known clusters Pleiades and Hyades, fol- 
lowed in April by the Beehive and M67 clusters. 
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Astronomers love these objects because the 
stars within them are the same age, which is 
easily deduced by plotting their brightness and 
colour. The observations should offer snap- 
shots of planetary systems during their early 
development, which could resolve debates 
about how planets form and migrate. 

Other opportunities lie closer to home. In 
Kepler’s previous life, it discovered that Nep- 
tune-sized ice giants are the most common 
planets in the Galaxy. This year and next, it will 
point at the Solar System ice giants Neptune 
and Uranus, hoping to learn more about their 
internal structures by observing the flickers of 
seismic vibrations. 

The team has planned as far ahead as 
April 2016, when Kepler will look towards 
the centre of the Milky Way in search of mys- 
terious bodies called free-floating planets. 
Past observations suggest that Jupiter-sized 
planets outnumber stars in the Galaxy by a 
factor of two or more. Most can be detected 
only when they cross in front of a distant 
star and the planet’s gravity bends the star’s 
light like a lens. Kepler should be able to 
confirm the population of these loners, says 
Andrew Gould, an astrophysicist at Ohio 
State University in Columbus. Presumably, 
some free-floating planets were kicked out 
of their systems, but they are so massive 
that it is hard to imagine how, he says; one 


CHANGING FOCUS 


The Kepler spacecraft will spend three years observing a 
ribbon of the sky (blue line) as it orbits the Sun. Every 
few months, it will pan to a new field of view (blue 
crosses) aligned with the plane of the Solar System. 


possibility is encounters with other stars. 

Other potential targets abound, including 
brown dwarfs — the smallest stars, which 
have clouds and storms like Jupiter — and 
white dwarfs, which are dim compared with 
Sun-like stars. This makes any planets hosted 
by white dwarfs easier to see, and thus they are 
tempting targets for future observatories such 
as NASA‘s James Webb Space Telescope, which 
is scheduled to launch in 2018. 

Some astronomers worry that K2’s produc- 
tivity will be limited not by the availability 
of data, but by NASA‘ research grants. Only 
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US$2 million per year is guaranteed for Kepler 
work; a separate $17-million pot is spread 
annually across all analyses of data archived 
from previous NASA missions. The move to 
increase archival funding is a slow culture shift 
for NASA, says Gregory Sloan, an astronomer 
at Cornell University in Ithaca, New York, but 
one that is necessary to give researchers the 
years they will need to continue reaping scien- 
tific benefits from missions such as K2. “The 
thing is, it’s a huge, rich data set,” he says. “And 
there will be years and years before we have 
really digested it? m= 


© 2014 Macmillan Publishers Limited. All rights reserved 


| NEWS IN FOCUS 


STRUCTURAL BIOLOGY 


Data bank struggles as 
protein imaging ups its game 


Hybrid methods to solve structures of molecular machines create a storage headache. 


BY EWEN CALLAWAY 


S tructural biology, the mapping of complex 


biological molecules such as proteins, is 
in the grip of a revolution. The field has 

long been dominated by X-ray crystallography, 
a technique made iconic by its role in decod- 
ing the DNA double helix in the 1950s. But the 
need to tackle more complex structures and to 
watch ‘molecular machines’ function in real 
time is fuelling a shift towards hybrid imaging 
methods that can create moving models. 

That is posing a challenge for the 
world’s official repository for protein 
structures: the Protein Data Bank (PDB), 
which relies almost exclusively on crystal- 
lography data and lacks the standards and 
software infrastructure to archive structures 
described by hybrid methods. This month, 
leaders of the four organizations around the 
world that host the data bank held a workshop 
in Hinxton, UK, to hatch a plan to ensure that 
hybrid models and their insights into funda- 
mental biology and disease do not get lost. 

Historically, structural biology has focused 
on generating three-dimensional (3D) descrip- 
tions of individual proteins. In many cases, this 
is a task perfect for crystallography, in which 
a molecule is bombarded with X-rays and the 
pattern of scattered radiation reveals the posi- 
tion of each atom. The technique underpins 
dozens of discoveries that led to Nobel prizes. 

Archiving struc- 


tures in the central- “These days, 
ized, free PDB is being a 

crucial becauseitena- crystallographer 
bles other researchers isnot good 

to use them to address enough. oe 
questions never imag- 


ined by their discoverers. Most journals will 
publish structures only if they have been depos- 
ited in the PDB. This year, the database topped 
100,000 registered structures, the vast majority 
of which were determined using X-ray crystal- 
lography (see Nature 509, 260; 2014). 

But in the past decade or so, structural 
biology has moved on. Researchers now want 
to describe intricate cellular structures made up 
of dozens, or even hundreds, of proteins that 
move relative to each other do jobs suchas recy- 
cling proteins or copying chromosomes. These 
molecular machines cannot be coaxed into 
the tidy, immobile crystals required for X-ray 


A subunit of a ribosome, a molecular machine. 


crystallography. “These days, being a crystal- 
lographer is not good enough,’ says Gerard 
Kleywegt, a structural biologist at the European 
Bioinformatics Institute in Hinxton, who heads 
the European annex of the PDB. 

Hybrid methods take an ‘everything but 
the kitchen sink approach to structural biol- 
ogy, incorporating many different techniques. 
Some can offer a dynamic view of a molecular 
machine in motion; for example, fluorescence 
resonance energy transfer measures the distance 
and interactions between proteins. Others, such 
as cryo-electron microscopy, can deliver near- 
atomic detail of entire complexes without the 
need to crystallize them. Computer programs 
then integrate the various bits of information 
— including data from crystallography-friendly 
proteins inside the molecular machine — to 
produce a 3D model that best fits the data. 

The scientific literature is now studded with 
products of the hybrid approach. In 2012, struc- 
tural computational biologist Andrej Sali of the 
University of California, San Francisco, and his 
collaborators used hybrid methods to describe 
the structure of the 26S proteasome complex 
(K. Lasker et al. Proc. Natl Acad. Sci. USA. 109, 
1380-1387; 2012), which recycles proteins and 
may malfunction in neurodegenerative diseases 
such as Alzheimer’s. The researchers have now 
used the model to identify potential drugs 
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that alter the proteasome’ activity. This year, 
another team published a hybrid model of the 
key HIV proteins that sneak the virus into a cell, 
which may help in vaccine design (M. Pancera 
et al. Nature http://doi.org/wfz; 2014). 

The hybrid approach has also tackled the 
ribosome, which produces proteins; the nuclear 
pore complex, which provides a gateway 

between the genome in the nucleus and the 
rest of the cell; and the molecular syringes 
made by bacteria that inject proteins 
into cells. Models of many more molecular 
machines are expected. “We're going to enter a 
period of exponential growth in the generation 
of these hybrid structures,’ says Stephen Burley, 
a structural biologist at Rutgers University in 
Piscataway, New Jersey, who heads one of the 
two US annexes of the PDB. 

At the PDB workshop, on 6-7 October, 
Kleywegt, Burley and three dozen others 
hashed out the challenges that these triumphs 
are creating for the PDB. Crystallography yields 
a standardized set of data files in which a struc- 
ture and its level of precision are self-evident; 
by contrast, the underlying data for the hybrid 
models exist in a mishmash of formats such as 
X-ray diffraction patterns or electron-micro- 
graph pictures. And going from raw data to a 
model involves more steps with hybrid methods 
than in crystallography; it also requires more 
assumptions, often leading to multiple possible 
ways of interpreting the results. 

Most workshop attendees agreed that it will 
be crucial for structural-biology databases to 
capture not just the hybrid models’ raw data, 
but also how the models were put together, so 
that other researchers can verify and build on 
them. But there are many questions, such as 
how to store and distribute the data sets, which 
are much larger than crystallography files. 
The meeting ended with an agreement to seek 
funding for a new bank centred on molecular 
machines — and to come up with a name for it. 

It is imperative to find a way to curate hybrid 
structures if structural biology is to realize its 
potential, says cell biologist Jan Ellenberg of 
the European Molecular Biology Laboratory 
in Heidelberg, Germany, who led one of the 
teams that modelled the nuclear pore com- 
plex. Ultimately, he says, “we want to have the 
molecular structure of an entire cell. That’s still 
science fiction at the moment — but it’s some- 
where we can get to in 10, 20 years.” = 
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THELTHICS 
SUUAD 


Bioethicists are setting up consultancies for research — but some 
scientists question whether they are needed. 


tacy Hodgkinson and Amy Lewin 

had the best of intentions when they 

enrolled the pregnant 15-year-old in 

their study. The psychologists were 

evaluating an educational programme 

for young parents-to-be, and the teen- 
ager met all the inclusion criteria: she was 
15-32 weeks pregnant with her first child, 
under 19 years of age, and her partner — who 
did not live with her — was willing to partici- 
pate in the study. There was just one problem. 
Dad was 24 years old, and according to local 
laws he was guilty of child sexual abuse for 
sleeping with a minor. 

The couple had apparently lied to each other 
about their ages, but not to Hodgkinson and 
Lewin, both then at the Children’s National 
Health System in Washington DC. This pre- 
sented a dilemma. The scientists had promised 
the participants that their information would 
be kept confidential. But did that trump their 
legal duty to report the crime to the police? 
And how would that affect the family? 

“Here was a young father telling us hed like 
to be involved in his child’s life in a positive 
way, says Lewin, who is now at the Univer- 
sity of Maryland in College Park. Telling the 
authorities, she says, “could potentially do 
more harm than good”. 

In search of moral and legal guidance, 
Hodgkinson and Lewin contacted Tomas 
Silber, a paediatrician who also runs a research 
ethics consultation service, a ‘one-stop shop’ 
for advice on thorny research issues. 

To Silber, the course of action was clear. 


BY ELIE DOLGIN 


“There's only one thing you can do,” he says. 
“You have to report it.” After explaining their 
legal obligations to the couple, Lewin and 
Hodgkinson told the police, who launched an 
investigation. The teen and her partner broke 
off contact with the researchers, and Hodgkin- 
son does not know whether the father main- 
tained a positive presence in the child’s or the 
mother’s life — which was ultimately the goal 
of their programme. “Sometimes you do the 
right thing, but the consequences aren't good,” 
says Silber. 

Ethical dilemmas in research are nothing 
new; what is new is that scientists can go to 
formal ethics consultancies such as Silber’s to 
get advice. Unlike the standard way that sci- 
entists receive ethical guidance, through insti- 
tutional review boards (IRBs), these services 
offer non-binding counsel. And because they 
do not form part of the regulatory process, 
they can weigh in on a wider range of issues 
— from mundane matters of informed con- 
sent and study protocol to controversial topics 
such as the use of experimental Ebola treat- 
ments — and offer more creative solutions. 

The consulting services are “a really new 
area’, says Joshua Crites, a research ethicist at 
the Pennsylvania State College of Medicine in 
Hershey. “Even some of the most basic ques- 
tions get complicated really quickly, and it’s 
better to have a group of ethicists working 
together to sort this out” 

But many scientists either do not know 
that they exist or fear using them because 
they could add red tape to an already heavy 
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administrative burden. And this year, the 
US National Institutes of Health (NIH) 
scrapped funding for a working group to 
support ethics-consultation services and to 
develop best practices for the profession. 

Although financial support could return in 
some form, ethicists are not waiting around 
for it. Benjamin Wilfond, director of the Treu- 
man Katz Center for Pediatric Bioethics at 
Seattle Children’s Hospital in Washington, 
has set up the Clinical Research Ethics Con- 
sultation Collaborative, a group of around 
35 bioethicists who hope to keep improving 
the consultation service model, even without 
NIH support. “There’s energy behind con- 
tinuing what we started,” says Holly Taylor, a 
research ethicist at the Johns Hopkins Berman 
Institute of Bioethics in Baltimore, Maryland, 
and a member of the group. 


HERE TO HELP 

IRB approval is required for almost all 
human-subject research in the United States. 
The foundations for current IRB practices 
emerged 40 years ago in the wake of numer- 
ous ethical lapses in research, including the 
infamous Tuskegee experiments performed 
in Alabama between 1932 and 1972, in which 
doctors allowed syphilis to progress untreated 
in hundreds of African American men. Today, 
IRBs are the main channels for policing ethics 
in academic medical studies. But their primary 
function is to ensure adherence to regulatory 
and legal requirements. They do not always 
include members with bioethics expertise, and 


NLSHOP/SHUTTERSTOCK 
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discussion of ethics sometimes takes the form 
of box-ticking rather than careful deliberation. 

That is where consultants come in. Unlike 
IRBs, consultants can provide guidance 
throughout a study — not just at the point 
of regulatory review — and do so in a non- 
confrontational advice-giving capacity. They 
offer “an open space for talking about research 
ethics in a way that is not driven by the regula- 
tory environment’, says Marion Danis, chief 
of the bioethics consultation service at the 
NIH Clinical Center, a research hospital in 
Bethesda, Maryland. 

The Clinical Center was the first organiza- 
tion to launch a research ethics consultancy, 
in 1996, and a handful of academic medical 
centres followed suit over the next decade. 
Then, in 2006, the NIH launched the Clini- 
cal and Translational Science Award pro- 
gramme to enhance drug development and 
testing in academic settings, and it led to a 
rapid expansion of the concept in the United 
States. According to a survey published last 
year, by 2010 more than 30 academic institu- 
tions had set up research-ethics consultation 
services. That said, fewer than half of them had 
fielded calls by researchers seeking advice in 
the previous year, and just six got more than 
ten calls’. “In most places, these have not ended 
up being high-volume activities,” says Steven 
Joffe, a medical ethicist who led a fairly idle 
service at Harvard Medical School in Boston, 
Massachusetts, until moving to the University 
of Pennsylvania Perelman School of Medicine 
in Philadelphia in 2013. 


ie 


Amy Hagopian, 
a global-health 
researcher at the Uni- 
versity of Washing- 
ton in Seattle, found 
herself turning to an 
ethics consultant for 
help with a study in 
Iraq to find out how 
many people had 
died as a result of 
the US-led conflict 
that began there in 2003. Her team needed to 
obtain informed consent from participants, 
but the researchers on the ground in Iraq were 
concerned that including the University of 
Washington's name on the consent forms — a 
requirement for IRB approval — would make 
it difficult to get the data they needed. “They 
feared that being associated with American 
institutions would get them killed’, says Hago- 
pian. “They dug in their heels and refused” to 
carry the form. 

Hagopian wanted to strip the university’s 
name from the consent document, but the 
IRB insisted that it was an important part of 
informed consent, which is meant to protect 
participants, not the investigators. The impasse 
brought Hagopian and her team to Wilfond. He 
concluded that it would be ethical to remove 
mention of the institution, for three main 
reasons: first, research subjects would also be 
placed at risk by signing a document linking 
them to the University of Washington; second, 
apart from the link to the United States, the 


“You need some 
independent 
nerson to say, 
‘Well, let’s step 
hack and think 


about this.’. 


pe | 


FEATURE | NEWS 


| VEN) 


research involved minimal risk to the partici- 
pants; and third, the study would not happen 
unless the name of the institution was removed. 

The IRB eventually agreed with Wilfond. 
The researchers went ahead with the study 
and found that nearly half a million people 
had died from causes attributable to the Iraq 
war between 2003 and 2011 — a figure much 
greater than most previous estimates~. “We 
couldn't have done this without him,’ Hagopian 
says of Wilfond. 


WORLDLY ADVICE 

Of course, bioethicists have been providing 
advice about research for years, long before 
the NIH created a formal service. Outside the 
United States, ethics consultations mostly hap- 
pen through the regional equivalent of an IRB 
or take place in casual conversations or ‘kerb- 
side consults. “All in all, it’s pretty ad hoc,” says 
Mark Sheehan, who studies ethics at the Ethox 
Centre of the University of Oxford, UK. 

At some institutions in Canada, ethics advice 
about research studies can also be sought 
through the services that help patients and 
doctors to settle end-of-life decisions and other 
moral issues in health care. Unlike in the United 
States, where training programmes in research 
ethics and clinical ethics are usually separate, 
in Canada “we all tend to have both kinds of 
expertise pretty much’, says Ann Heesters, 
a bioethicist at the Toronto Rehabilitation 
Institute in Ontario, one of the only Canadian 
hospitals that publicizes the availability of eth- 
ics consultations for researchers. According 
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to Heesters, around one in every seven of her 
consultations pertains to research. 

In Australia, “it’s very difficult for research- 
ers to be able to seek advice before they submit 
the full application” for official ethics review, 
says Nikola Stepanov, who studies research eth- 
ics and law at the University of Queensland in 
Brisbane. And ifa human-research-ethics com- 
mittee — the Australian equivalent of an IRB 
— finds ethical problems ina study’s protocol, 
researchers may have trouble finding a formal 
channel for further guidance. 

“Were obviously in the stage that the United 
States was at before it brought in these ethics 
consultations,” says Stepanov. “Something 
more formalized would be very appropriate.” 

But not all ethicists agree that a separate 
service is needed, even within the United 
States. “If the IRB has the responsibility for 
ethics review, why are we pulling in someone 
else?” asks Susan Kornetsky, director of clini- 
cal research compliance at Boston Children’s 
Hospital in Massachusetts. Norman Fost, who 
studies ethical and legal issues in research at 
the University of Wisconsin—Madison, would 
rather see bioethics panels folded into the 
standard IRB structure. Because IRBs are “a 
toll gate that everybody has to go through’, Fost 
says, these panels, which would ideally include 
qualified ethicists, should “look at every single 
protocol and identify problems that nobody 
else has yet identified”. Relying on a separate, 
optional service means that some problems 
could be missed. “It’s the cases they’re not get- 
ting called about that worry me,’ he says. 


COMPLEMENTARY SERVICES 

Advocates say that the aim of consultancy 
services is to complement IRBs and other 
oversight bodies, not to become entwined with 
them. “For innovative research designs, you 
need some independent person to say, “Well, 
let’s step back and think about this not just 


from the standpoint 
of do the regulations 
permit it, but does 
it fulfil the spirit of 
what people want 


“It's the cases 
they're not 
setting called 


done with the public 

about that worry research enterprise??” 
says bioethicist 

me is Steven Miles at the 
: University of Minne- 


sota in Minneapolis. 

Wilfond has been working to increase the 
visibility and the rigour of ethics consultan- 
cies. Last year, for example, he and Taylor 
launched a biannual series in the American 
Journal of Bioethics entitled ‘Challenging Cases 
in Research Ethics. The latest case, from Silber 
and his colleagues describing the obligation to 
report statutory rape, was published in Sep- 
tember’. Wilfond is also collecting descriptive 
data about consultations and has expanded 
the reach of his service at the University of 
Washington by welcoming external requests 
— including from pharmaceutical companies, 
which typically employ armies of lawyers but 
rarely bioethicists. In such cases, the University 
of Washington consults on a fee-for-service 
basis: US$200 an hour for drug companies, 
less for non-profit organizations. 

The Stanford Center for Biomedical Ethics in 
California also works with drug firms. There, 
panellists provide their time and advice at no 
cost, on the condition that they can publish 
case studies. In 2011, for example, a start-up 
company approached the centre for guidance 
on the sale and promotion ofa prenatal genetic 
test that involves analysing fetal DNA circulat- 
ing in maternal blood (see Nature 478, 440; 
2011). The consultation led to an academic 
paper that called for amendments to informed- 
consent procedures and restrictions on the sale 
of direct-to-consumer tests’. 

“Many of our consults end up that way,’ says 
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Mildred Cho, associate director of the Stanford 
centre. “We do treat these things as scholarly 
activity as well as a service.’ Cho estimates that 
around one-quarter of her service's cases come 
from the drug industry. 

Wilfond is currently working to expand the 
panels to draw in a wider range of views and to 
broaden the experience of panellists, a move 
that he considers one of his most innovative 
for ethics consultancies. In June, he was called 
into a meeting at Seattle Children’s Hospital 
with Ron Gibson, director of the hospital’s 
cystic fibrosis centre. Gibson had been gather- 
ing data from several studies that were using 
laboratory tests that can be performed only 
in a research setting or fall outside of recom- 
mended guidelines, but he was unsure whether 
he should incorporate the results into patients’ 
routine clinical care. Seven bioethicists from 
Wilfond’s collaborative telephoned into the 
meeting, ready to offer their take. 

As the consultation began, Wilfond 
explained that the point of bringing the ethi- 
cists into the discussion was twofold. First, it 
would offer Gibson a wider range of opinions, 
and second, it would expose the advisers on 
the phone to a case they might not otherwise 
have been involved in. “There’s a lot of learn- 
ing that goes on bidirectionally,’ Wilfond says. 
The hour-long meeting was “educational”, says 
Gibson, who has since implemented a new pol- 
icy for his research programme, although he 
declined to discuss specifics. “The spectrum of 
opinions on various levels of data sharing was 
reassuring that there is likely not one best way 
to address the issue.” 

Wilfond and his colleagues hope that more 
scientists and clinicians will start to see the 
benefits of their services. “There just hasn’t 
been an awareness of how important this is,” 
says Charles MacKay, a consultant in clinical 
and research bioethics in Bethesda, Maryland. 

But getting scientists to actually buy into 
such services may require a shift in attitudes. 
“Researchers generally have become mem- 
bers of a culture of research compliance,” says 
Christian Simon, a bioethicist at the Univer- 
sity of lowa Carver College of Medicine in Iowa 
City. They are responsive to what IRBs require, 
he says, but that sometimes means that they are 
unwilling to step back and consider the finer 
ethical details. 

“We're not the ethics police,” says Reid 
Cushman, co-director of the ethics consul- 
tation service at the University of Miami in 
Florida. “We're just another resource to help 
you stay out of trouble.” m 


Elie Dolgin is a science writer in Somerville, 
Massachusetts. 
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‘Tales of 
the hobbit 


In 2004, researchers announced the discovery of 

Homo floresiensis, a small relative of modern humans 

that lived as recently as 18,000 years ago. The ‘hobbit’ 

is now considered the most important hominin fossil in a 

generation. Here, the scientists behind the find tell its age” v. 
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Instead, the researchers were trying to trace how ancient 

people travelled from mainland Asia to Australia. At least 
that was the idea when they began digging in Liang Bua, a large, 
cool cave in the highlands of Flores in Indonesia. The team was 
led by archaeologists Mike Morwood and Raden Soejono, who 
are now deceased. 


TT: hobbit team did not set out to find a new species. 


THOMAS SUTIKNA (field archaeologist in charge of the excavation): In 
1999, Mike came to our office and proposed excavating at Liang Bua. 
‘Liang Bua’ means cold cave. It’s 500 metres above sea level, and it’s situated 
very close to the confluence of two rivers, which provide natural resources 
like water and raw materials for stone artefacts. The roof is really high, 
providing good circulation. There's regular sunlight year round. It’s very 
suitable for habitation. 


RICHARD ‘BERT’ ROBERTS (geochronologist who conceived the dig with 

Morwood): The excavations started off ona very small scale in 2001, but 

we found some interesting things: bones of stegodons, which are these 

2 now-extinct primitive elephants. There were lots of Komodo dragons, lots 

Bit extoentie ae e Re ctias Neues of rat bones, all sorts of other species, including this kind of giant stork. 
JUNGERS | Re ae We didn't find anything spectacular until 2003. 


WAHYU SAPTOMO (field archaeologist): Before Mike Morwood left for the 
season in 2003, I said, “Why are you leaving now? If you leave, maybe 
we will find something important.’ A few days later, on 2 September, I 
was supervising sector VII. Our local workers were digging at around 
5.9 metres. Their trowel met with a skull. A member of our team who 
specializes in animal and human bones came down and said, “Yes, I’m 
sure that’s a human bone. But it’s very small” Thomas, he was sick and 
was at the hotel that day. So I went back and met with him. I said, “We 
have something very important. We found the first hominid in the 
Pleistocene layer.” 


ROBERT 
MARTIN |. 


SUTIKNA: Immediately, my fever vanished. I couldn't sleep well that night. 
I couldn't wait for sunrise. In the early morning we went to the site, and 
when we arrived in the cave, I didn't say a thing because both my mind 
and heart couldnt handle this incredible moment. I just went down to 
the pit and looked at the bones carefully. It would be impossible to get 
them out because of the condition of the bones. So we decided to cut 
the remains out, together with the sediment, block by block, and bring 
them back to the hotel. We needed several days to take out all the bones. 


ROBERTS: It was a very small body. That was the first thing that was imme- 
diately apparent — but also an incredibly small skull. We first thought, 
“Oh, it’s a child” There was a guy who was working with us called Rokus. 
He did all the faunal identifications of the bones. But Rokus said, “No, no, 
no, it’s not a child. It’s not modern human at all. It's a different species.” 


SAPTOMO: Thomas drew the skeleton on paper, and he faxed the drawing 
to Mike and to Professor Soejono in Jakarta. 


SUTIKNA: Mike called me at night. I couldn’t understand what he was 
saying over the phone, he was so excited. 


ROBERTS: Mike invited Peter Brown to come and look at the remains. 
Peter’s a very good palaeoanthropologist, but he’s kind of a difficult 
person as well. Peter can be kind of prickly. 


PETER BROWN (palaeoanthropologist): Mike doesn't know much about 
human skeletons, and the Indonesian researchers didnt either. I was 
quite sceptical. The drawing may as well have been a Greek urn in terms 
of looking like anything much at all. 
I was interested and willing to go to Jakarta. It’s an interesting place 
to visit. I like the food. I like the atmosphere and the culture and every- 
mM laa | thing else, but I didn’t expect to find anything interesting or important. 
| f At the most, I thought it was going to be a sub-adult modern human > 
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skeleton, probably dating to the Neolithic period or maybe a little 
bit earlier. The other possibility was a pathological individual, someone 
with a growth disorder. Those were my expectations when I turned up. 


ROBERTS: Peter’s as sceptical as I was, probably thinking, “A new human 
species? Sure, probably Mike getting overexcited in Jakarta. Hed been 
in the bush for too long” Good on him for flying over there straight 
away, because most people have got teaching commitments and things 
you've got to be getting on with. 


BROWN: I walked into the laboratory with Mike and the lower jaw — 
the mandible — had been cleaned. And it was in about six seconds, 
maybe less, of looking at the lower jaw, I knew it couldn't have beena 
modern human lower jaw. I knew it had to be from another species, 
and things went on from there. I started cleaning the skull and doing 
other work on the collections. Everything was very, very soft and had 
to be dried out and coated with preservative. It would have been very 
easy to scratch or smash. If youd stepped on it, you would have ended 
up with a pile of mashed potato, more or less. 


ROBERTS: Some people, like the guys in Africa, seem to work on things 
for about 10 or 15 years before you finally get a fossil description. Peter 
was working at lightning speed by comparison. To Mike and myself it 
still seemed to take forever. 


BROWN: I smuggled some mustard seeds through customs for the 
purpose of measuring the volume of the brain. So I cleaned it all as 
carefully as I could. I turned it upside down, and I poured the seeds 
in it. [d taken enough seeds to measure the size of a modern human 
brain, say 1.5 litres of seeds, but it only took about 400 millilitres. I was 
flabbergasted. The last time things with a brain that size walked was 
around about 2.5 million to 3 million years ago. It was not making any 
sense at all. I recorded it a second time, a third time. Mike and Thomas 
are looking at me and wondering why I'm going a bit pale. I was trying 
to push more seeds into the skull with my finger to try and increase 
the volume, because it was insane really. 


ROBERTS: The carbon dates came in and they were around 18,000 years. 
So at that point it was, “Oh, this is absolutely bizarre.” This was a very 
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primitive-looking human who was living this 
side of the last glacial maximum, this side of 
the last Ice Age. 


BROWN: If Mike had said hed found evidence of 
an alien spaceship on Flores, I would have been 
less surprised. 


he team soon determined that the skeleton 

belonged to a female just over a metre tall. 
They dubbed her LB1. Brown and Morwood 
wondered whether the species was an offshoot of 
Homo erectus — an ancient relative of humans 
that originated in Africa about 2 million years ago 
and lived on Java, near Flores, until about 150,000 
years ago. If its descendents had survived on Flores 
until the tail end of the last Ice Age, they might 
have shrunk in response to the island’s limited 
resources. Alternatively, the species might have 
been related to australopithecines, small-bodied 
hominins that roamed Africa more than 2 million 
years ago. Brown and Morwood knew that they 
needed to let the world in on their find. 


HENRY GEE (senior editor at Nature): I had no warning. Usually with 
these things you tend to get a little bit of scuttlebutt. But this one just 
came onto my desk one day in March 2004, and there it was. 


ROBERTS: Poor old Henry probably fell off his chair when he got the 
papers. 


GEE: I have to say at first it didn’t strike me as this most fantastic discov- 
ery. They had this strange creature and the tone of the paper was very 
subdued. When you're an editor you read between the lines, and the 
line was: “Help us. We don't know what this thing is. We're just going 
to describe it and we're going to give it a very non-committal name 
and see what you think” 


BROWN: I thought it was a new species and probably a new genus. I just 
thought it was so different. 


GEE: When it came to us, they had given it this Latin name, 
Sundanthropus floresianus — man from the Sunda region from Flores. 
Well, the referees said it’s a member of Homo so that’s what it should be, 
and one of the referees says floresianus actually means ‘flowery anus’ 
so it should be floresiensis. So Homo floresiensis came along. 


ROBERTS: We knew we had to come up with a name for publicity 
purposes. We couldn't call it Homo floresiensis, so Mike said, “I like 
hobbit. I said, “Okay as long as it’s not going to cause any problems with 
Tolkien’s estate,” or whatever they're called. They can get pretty stroppy 
with people using their trademarked words. Mike referred to LB1 as 
hobbit, not ‘the hobbit, as if its name was Mary. Fora while, Mike was 
trying to persuade Peter Brown to call it Homo hobbitus. I think he just 
thought Mike was a complete charlatan for even suggesting it. 


BROWN: Mike and I didn’t agree about nicknames because I thought it 
trivialized it, and I thought it would result in every loon on the planet 
telephoning me as soon as it was published. And that was true — 


endless bizarre telephone 
calls from people who 
had seen some small hairy 
person in their backyard. 


“SMORE _ See Nature's online 


special on the hobbit: 
ONLINE go.nature.com/hobbiti0 


hen the papers reporting the discovery were 

published?” on 27 October 2004 (28 October in 
Australia and Indonesia), they grabbed public attention in 
a way few science stories manage. 


LEIGH DAYTON (science correspondent): It was huge, absolutely huge. 
Everybody was talking about it. Even my editors who absolutely do not 
like science whatsoever, they were fascinated. I’m just looking at the 
newspaper, the hard copy of the story I wrote for The Australian, and 
the other stories are all the usual political stuff, police probes, inflation 
figures, and then, “Small, but they’re only human”. 


BILL JUNGERS (palaeoanthropologist): I had to check the date to make 
sure it wasn’t April Fool’s Day. It was so preposterous on the surface 
that there could be this little hominin that evolved in isolation in 
southeast Asia for God knows how long and persisted until almost 
the Holocene. 


ROBERTS: This one really did garner just a massive amount of media 
interest. Way over the top. Every newspaper wanted to talk to you, TV 
programmes; everyone wanted to talk to everyone. 


BROWN: The press being the way they are, they always like controversy. 
It's no good just to have a good story. Nobody wants to read that, so 
they're always trying to find someone who disagrees. 


MACIEJ HENNEBERG (palaeoanthropologist): I had a phone call at 7 a.m. 
on 28 October 2004, from an Australian Broadcasting Corporation 
(ABC) journalist, who asked me, “What do you think about the new 
find?” I said, “I don’t think anything, you just woke me up.” He said it 
was published in Nature that there is this new species. I said, “Okay. 
Give me a few hours so I can find the papers.” While I was reading, I 
was reminded ofa paper ona microcephalic [small-brained] skull from 
Crete, about 4,000 years old. All the measurements of the LB1 skull 
were not significantly different from this clearly pathological 

skull from Crete. So at 11 a.m., I went on ABC radio and I said 

I think that what was found was a pathological specimen. This 

kind of explanation attracted a lot of attention. 


L controversy erupted when Teuku Jacob, head 
of Indonesia’s national palaeoanthropology institute, 
decided that the hobbit’s bones belonged in his lab. 


ROBERTS: Soejono invited Teuku Jacob to come over and have a 
lookat the bones, and then Jacob just put them in a suitcase and 
walked out the door with them. Mike was absolutely ballistic. 
I didn't think wed ever actually get to see the bones back again. 


BROWN: The real disreputable thing was they tried to take moulds 
and cast this material. I hadnt done that because it was clear the 
material was too soft and too fragile to take moulds. When they 
had done that, the lower jaw was broken, the skull was damaged. 


he bones made it back to Jakarta, but the debate 

over the hobbit’s identity grew hotter. Morwood 
brought in specialists to examine the fossil, and they 
agreed with him that it was a new species. Some 
key studies focused on endocasts — moulds of the 
inside of the hobbit’s skull that revealed details of 
its brain. 


JUNGERS: Mike didn’t have to ask twice. I was introduced to the team 
in person in Jakarta in 2006, and a good part of my career has been 
obsessing about this fossil ever since. 


ROBERTS: Quite a few on the American side started to throw their 
weight behind our team, and that helped steady the ship. They really 
took the hobbit to pieces and put it back together again, and found it 
was really a very unusual kind of animal. 


JUNGERS: I was able to assemble a nearly complete foot that was unlike 
anything I’ve ever seen in the fossil record. I think these little guys 
were climbers. I don’t know if you've ever been to Flores, but there 
were these huge Komodo dragons on the island when those guys were 
around. The adults don’t climb, so if I were a hobbit, I would find 
refuge in the trees. 


DEAN FALK (evolutionary anthropologist): Mike Morwood invited me 
to prepare and describe the endocast. I had a bias going into the study. 
I thought, because the brain was so small, that it was going to look like 
other primates with brains of that size, namely apes, and it didn't. It 
didn't look like a chimp brain. What it mostly looked like in overall 
shape was the endocast from Homo erectus. 


ther scientists have continued to support the idea that 
the LB1 specimen was a diseased human. 


ROBERT MARTIN (biological anthropologist): I think there is something 
seriously weird about the LB1 specimen. The best I could come up with 
is microcephaly. There are hundreds of genes that can produce a small 
brain, with knock-on effects throughout the body. 


FALK: We felt like, okay, we need to do a microcephalic study. I ended 
up running with that ball and got together a very small sample, about 
ten, but it’s really hard to find ten endocasts from microcephalics. 


We looked at LB1 and we showed there was no way it was a 
microcephalic. As far as I’m concerned, that paper® answered it, 
and I think that persuaded most everybody, except a few people. I 
think they even were persuaded eventually, because they changed 
diseases. The ‘sick-hobbit hypothesis, Bill Jungers called it. 


JUNGERS: It seems like we've got a new one every day. It’s just crazy, 
crazy stuff. We spent a lot of time unfortunately having to deal 
with things like Laron syndrome and cretinism and other wild 
and woolly hypotheses. 


HENNEBERG: About two anda half years ago, it all clicked. I could see 
that all signs of the bones were compatible with Down’s syndrome. 
There are about 20 or so characteristics that are matching. There 
is nota single characteristic of LB1 that doesnt match. 


MARTIN: 1 don’t think we’ve made much progress in ten years, 
quite honestly. What we have is entrenched positions. We should 
be talking about the interpretations and the facts, not casting 
aspersions. I’m not an idiot because I’m questioning this. 


LESLIE AIELLO (palaeoanthropologist): There were some impor- 
tant issues raised during the controversy that the proponents of 
a new-species hypothesis had to address. But the criticism of the 
hypothesis didn’t turn out to hold any water. 


FALK: In palaeoanthropology there’s always controversy over new speci- 
mens. Always has been, always will be. I was a little surprised that in 
this day and age it would be so over the top. 


MELLO: There were personalities involved. The field is full of a lot 
of egos, and particularly male egos. 


ROBERTS: I think the view now is: yup, it’s not a diseased mod- 
ern human. But whether it’s a shrunken down version of a 
Homo erectus, or whether it’s something more ancient like a 
Homo habilis, or even an australopithecine who's managed to 
struggle out of Africa — that’s still pretty much up for grabs. 


BROWN: I’m mostly interested in how it got to be where it was, 
which will require the discovery of additional material. That 
may not happen in my lifetime. 


AIELLO: It'll happen. I tell my students that something is found every 
year and I never gave the same lectures twice. 


ROBERTS: That’s why we're still trying to dig up in the centre of the island, 
looking in the Soa Basin on Flores. Mike took another view: let's try 
and find the bones of the ancestors, wherever they came from, which 
was probably north of Flores. So Mike and I went to the Philippines, 
and we also went to Sulawesi, Indonesia. Mike was still doing excava- 
tions on Sulawesi, as are several of the people here. 


JUNGERS: I never met anybody who was as single-minded and sustained 
in their work as Mike. He was always looking over the horizon to the 
next excavation, the next expedition. I excavated at Liang Bua off and 
on, and the last time I saw Mike was that summer before he passed. 


ROBERTS: Mike actually came and saw me and said, “Ah Bert, I need to 
talk to you about something” He said “T’ve got cancer.” He seemed to 
be getting tired more and more easily, and that wasn't like Mike at all. 


JUNGERS: Mike died of complications of prostate cancer, and he was so 
consumed by his work that I think he neglected his health. It didn’t occur 
to him that he should take better care of himself. Even when he was diag- 
nosed, the only thing he wanted to talk about was the next expedition. 
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Jephaly? 
—Vetinism? 
Aron syndrome? 
n's syndrome? 


He was an original and became a good friend, and I miss him. 


ROBERTS: Who would have thought ten years ago that Mike wouldn't 
have been with us now? He was one of the forces of nature. Without 
Mike, it wouldn't have happened. 


he hobbit team is still digging up Liang Bua. With new 

dating work, the researchers hope to determine when 
H. floresiensis went extinct and whether it overlapped with 
modern humans in the region. The hobbit’s discovery thrust 
southeast Asia to the forefront of research into human 
evolution, suggesting that key events might have happened 
there. But the find also complicated the history of Homo 
species in Asia. 


ROBERTS: We had such a nice simple story, where we had modern 
humans and Neanderthals, and we bumped them off, that was the 
end of Neanderthals. We ventured across southeast Asia and it was 
basically empty because Homo erectus had died out there already, 
and we sort of just wandered into Australia and there we go. It was a 
clean and almost crisp little story. It made nice sense. Everyone was 
happy with that. And then suddenly the hobbit pops its head up. 


BROWN: Now I’m more open to the idea that very small-bodied and 
small-brained bipeds moved out of Africa at a much earlier date, maybe 
3 million years ago, or earlier. I’m more open to the idea that there were 
lots of failures in the evolution of bipeds. Some were successful, some 
werent. It's a very branchy tree, and it just so happens we've survived. 


ROBERTS: To me, the ultimate value of the hobbit is not what it is, in and 
of itself, because it’s just a dead end. It probably didn’t lead to anything 
that’s alive today. But it opened up the door for people to think more 
broadly about everything. I think the hobbit changed the way people 
thought. m SEE COMMENT P.427 


Ewen Callaway writes for Nature from London. 
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The adult skull of Homo floresiensis (centre) at the 2004 press conference announcing the species’ discovery. 


Small remains still 
pose big problems 


Ten years after the publication of a remarkable find, Chris Stringer 
explains why the discovery of Homo floresiensis is still so challenging. 


pologist Peter Brown teasingly e-mailed 
me pictures of a strange-looking skull, 
asking what I thought it was. I knew that he 
had been working in east Asia, so I guessed 
that the images might represent the first dis- 
covery of a very primitive member of our 
genus, Homo, from somewhere like China. 
Gradually, Brown revealed the even more 
astonishing news of the skulls remote loca- 
tion and recent age. That October, he, Mike 
Morwood and colleagues published analyses 


lE early 2004, the Australian palaeoanthro- 


in this journal'” with the controversial 
proposal that the tiny skull and its associ- 
ated skeleton represented a new human 
species. They named it Homo floresiensis, 
which Morwood dubbed ‘hobbit; owing to 
its diminutive stature — a moniker that the 
global press quickly ran with. 

The researchers 
posited that a primi- 
tive hominin persisted 
into the era of anatomi- 
cally modern humans” 


See Nature's online 
special on the hobbit: 
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and lived in Flores, part of the remote string 
of Wallacean islands east of Java that have 
remained isolated since their formation (see 
‘How did the hobbit get to Flores?’). Contro- 
versy about this species continues to this day, 
including whether it even belongs in Homo. 


UNEXPECTED TRIP 

In 2004, like most palaeoanthropologists, I 
thought that only modern humans (Homo 
sapiens — like us) had travelled beyond 
southeast Asia in the past 60,000 years. 
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WHERE DOES THE HOBBIT BELONG? 


More than a decade after scientists unearthed a startling tiny skull, debate 
rages over which branch of the human tree bore Homo floresiensis. 


H. floresiensis 
mixes modern 


ce and ancient 
q features. 


4 million years ago (approx.) 3 


> By then, people had devised sea-going 
watercraft essential for such a journey. It 
seemed unlikely that more-ancient humans 
could have made such a voyage’. 

The excavations that first led to the idea 
that ancient humans did so began in 2001. 
Morwood, a New Zealand-born archaeolo- 
gist, led an international team in the huge 
Liang Bua (meaning ‘cool cave’) on Flores, 
hoping to find evidence of the earliest modern 
humans to colonize Wallacea, Australia and 
New Guinea. The project reopened trenches 
several metres deep from previous Dutch and 
Indonesian work. It soon yielded promising 
finds: stone tools that seemed to be more than 
10,000 years old, and fossils of a pygmy form 
of the extinct elephant-like Stegodon. 

In 2003, at a depth of around 6 metres, the 
team encountered a small skeleton (LB1) 
that they first thought must represent a mod- 
ern human child. Then they noticed other 
details: the wisdom teeth in its jaws had fully 
erupted, and the tiny skull showed definite 
brow ridges above its large eye sockets. 

The skeleton was dated from associated 
materials to less than 20,000 years old. 
Morwood and colleagues argued that it rep- 
resented a unique example of insular dwarf- 
ism in humans. This is a well-known process 
whereby large mammals isolated on islands 
evolve smaller bodies in response to limited 
resources and the lack of predators’. 

Morwood and colleagues argued that 
a population of Homo erectus could have 
travelled, perhaps by boat, to Flores from 
Java (500 kilometres away), where H. erectus 
was first identified in the 1890s. Java, having 
been repeatedly connected to the rest of Asia 
over the past 2 million years when sea levels 
were low, was thought to mark the farthest 


Features of Australo- 
pithecus, including short 
legs, reinforced jaw and 
flared hip bones, are 
present in H. floresiensis. 


Homo rudolfensis 


& 


Australopithecus afarensis 


H. habilis, like the 
hobbit, had a 
small brain and 
reinforced jaw. 


Homo habilis 


Australopithecus africanus 


© Australopithecus garhi 


extent from Africa of colonization by ancient 
humans. Morwood and colleagues pos- 
ited that Flores’s ancient settlers underwent 
island dwarfing, in parallel with other colo- 
nizing mammals such as Stegodon. Stone 
tools associated with Stegodon bones in Liang 
Bua suggested that H. floresiensis could have 
hunted and butchered these animals. 


ONGOING ARGUMENTS 
Stone tools discovered elsewhere on Flores, 
analyses of which were published’ in 2010, 
suggest that potential ancestors of H. flo- 
resiensis could have been on the island a 
million years ago. But considering an island 
on the other side of the world — Britain — 
with its discontinuous record of human set- 
tlement over 900,000 years, I can also imagine 
episodic human colonizations on Flores. 

In 2009, a collection of studies® analysed 
LB1 in more detail, along with other fos- 
sils attributed to H. floresiensis, including 
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1) Australopithecus sediba 


Some argue that the small 
hobbit bones resemble an 


abnormal, dwarfed H. sapiens. 


Debated 


Homo sapiens © a 


Homo heidelbergensis [= 
Homo antecessor 


Denisovans Si 
Homo floresiensis WE 


Homo neanderthalensis 


& Homo erectus 


A late group of H. erectus or 
an isolated early one could 
have borne H. floresiensis. 
All have similar brow 
ridges, thick skulls and 
brain shapes. 


1 Present 


a second jawbone (LB6), and fragments of 
limb bones of up to eight more individu- 
als. Features such as LB1’s broad, flared 
hipbones, short collarbone, and forwardly 
positioned shoulder joint all resembled the 
pre-human group known as australopith- 
ecines (‘southern apes’), which includes 
individuals such as the 3.2-million-year-old 
skeleton of ‘Lucy, comparable in size to LB1. 

These studies did not settle ongoing argu- 
ments about whether the finds represented 
a small, early human (a H. erectus shrunk 
through insular dwarfing) or an abnormal 
modern one, wrongly dated and analysed™. 
There were further problems: in late 2004, 
Teuku Jacob, a now-deceased Indonesian 
palaeoanthropologist, appropriated the speci- 
mens to conduct his own work in Yogyakarta. 
By the time the fossils were returned to 
Jakarta, following international pressure, 
some had been damaged irreparably*. 

The small brain of H. floresiensis has 


HOW DID THE HOBBIT GET TO FLORES? 


Early humans probably walked across Sunda. Modern 
humans took watercraft to Sahul. How the ancestors of 
Homo floresiensis arrived on Flores — never connected 


to these ancient landmasses — is a mystery. 


» Modern land boundaries 
Land boundaries at lowest sea 
level during the past million years 


H. FLORESIENSIS: HANDOUT/AAP IMAGE; H. HABILIS: CAROLINA BIOLOGICAL SUPPLY/VISUALS UNLIMITED/ 
SPL; AUSTRALOPITHECUS: THE NATURAL HISTORY MUSEUM/ALAMY; H. ERECTUS: TOM MCHUGH/SPL 
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provoked particularly fierce controversy. 
Some, citing parallels in other dwarfed mam- 
malian species”®, argue that it could derive 
from a H. erectus template, diminished but 
human in structural organization. Others 
rule out dwarfing, insisting that the braincase 
is much smaller than would be expected ifa 
H. erectus body were scaled down. They argue 
that the shape of the brain reflects pathology 
— perhaps a condition called microcephaly’. 

Various pathologies can explain some of 
the unusual aspects of the LB1 skeleton. But 
in my view, no syndrome so far proposed 
can account for the totality of evidence 
from Liang Bua. Neither cretinism, Laron 
syndrome nor Down’s syndrome duplicate 
the full suite of features. 


CLASSIFYING THE HOBBIT 

From the beginning, Brown and Morwood 
were torn over how to classify the fossil. In 
the first drafts of their paper they even cre- 
ated an entirely new genus for LB1 to reflect 
its unique combination of human and non- 
human traits — ‘Sundanthropus floresianus. 
But in the face of insistent reviewers, they 
shifted to the idea that their find was a 
dwarfed version of H. erectus*. 

Both Morwood and Brown indicated 
later that they were not convinced by that 
model”, and I join them in their doubts. 
The tiny brain of LB1, its body shape, and 
its foot, hand and wrist bones look more 
primitive than those of any human dating to 
within the past million years. Primitive traits 
of the wrist bones and jaw are replicated in 
at least one more individual from the site"*”. 
Like LB1, the LB6 lower jaw is small, lacks 
a chin, and shows internal bony reinforce- 
ments most like those in pre-human fossils 
more than 2 million years old”. 

Ten years on, it is still very difficult to 
decide between competing views on where 
the hobbit came from (see ‘Where does the 
hobbit belong?’). Island dwarfing from a 
local H. erectus population is probably still 
the most widely accepted idea, although this 
would require the re-emergence of primitive 
traits as well as convergence on H. sapiens in 
features such as tooth size and shape”. 

A more primitive origin, from a more 
ancient H. erectus population (such as 
the 1.8-million-year-old fossils found at 
Dmanisi in Georgia) would require less 
extreme dwarfing, but would still need the 
re-emergence of primitive traits. An even 
more primitive template, closer to Homo 
habilis or the pre-human australopithecines, 
is a closer match for the reinforced jawbone, 
brain and body size, wrist morphology, and 
body shape, but would require still more 
convergences on later Homo morphology in 
features such as cranial thickness, retracted 
face and dental reduction. 

We need more bones from Liang 
Bua to establish the morphological 


Liang Bua cave on the Indonesian island of Flores, the discovery site of Homo floresiensis. 


variation of H. floresiensis and set pathological 
explanations to rest. At present we do not 
even know the extent of sexual dimorphism 
in the species — would a male skeleton be 
much larger and more H. erectus-like? 

Isotope studies and analyses of preserved 
dental tartar could help to reconstruct the 
H. floresiensis diet, and investigations of den- 
tal microstructure might place the species 
taxonomically, because primitive homi- 
nins grew distinctly faster than H. erectus 
and later humans’. Even small amounts of 
ancient DNA would greatly clarify its evolu- 
tionary history, but it will require both tech- 
nological breakthroughs and good fortune to 
acquire analysable samples from the warm, 
wet conditions of Liang Bua. 

Significant work on re-evaluating the 
dates of the site, fossils and archaeology was 
under way before Morwood’s untimely death 
in 2013. The results, due soon, will undoubt- 
edly affect our views of H. floresiensis, and 
when and why it went extinct. 


MORE SURPRISES 

I think that there are more surprises to come 
from the rest of Wallacea. If the ancestors of 
H. floresiensis reached Flores, perhaps they 
also dispersed to other islands, and the exper- 
iment in human evolution revealed in Liang 
Bua might have equally remarkable parallels 
elsewhere — for example on Sulawesi, the 
Philippines and Timor. As Morwood pointed 
out*®, the powerful currents around Indo- 
nesia would have favoured transport from 
Sulawesi (north of Flores) rather than from 
Java, where the nearest H. erectus fossils have 
been found. The possibility of accidental raft- 
ing on mats of vegetation in such a tectoni- 
cally active region must also be considered; 
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in the 2004 Indian Ocean tsunami, some 
people who survived on floating debris were 
dispersed more than 150 kilometres. 

If the H. floresiensis lineage had a more 
primitive origin than the oldest known 
H. erectus fossils so far identified in Asia, 
then we would have to re-evaluate the domi- 
nant explanation for how humans arose and 
spread from Africa. Most current thinking 
assumes that the first dispersal from Africa 
was just before the time of the Dmanisi fos- 
sils*. An ancient origin for the hobbit would 
make that dispersal earlier and more com- 
plex’. It would mean that a whole branch 
of the human evolutionary tree in Asia had 
been missing until those fateful discoveries 
in Liang Bua. mSEENEWS FEATURE P.422 


Chris Stringer is a research leader in 
human origins at the Natural History 
Museum in London, UK. 
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COMMENT 


Be prepared 


Scenario-based training for disasters is better than just drawing 
up apaper plan, say Jennifer K. Pullium and colleagues. 


wo years after Hurricane Sandy hit the 

US east coast, we are often asked what 

we learned from the disaster. Some 
lessons sound trivial, such as the importance 
of keeping headlamps ready and accessible. 
Some are worth stressing: we learned that 
we had a good contingency plan. But the 
most important thing we learned is that the 
way people act during a disaster can be more 
important than the written plan. 

We are veterinary surgeons and animal- 
laboratory managers at the New York Univer- 
sity (NYU) Langone Medical Center. Prior to 
Sandy, our emergency preparation comprised 
lectures and exercises. This served us well, 
but our contingency plan did not anticipate 
staff who care for lab animals having to break 
through a ceiling and lower a basket to trained 
hazard-response professionals to rescue 
laboratory mice. Ten days into the disaster, 
our team was stressed, cold and exhausted. 
We triaged, treated and transported mice for 
seven straight hours until late into the night. 
We rescued 600 cages with thousands of mice, 
many of them unique strains that investiga- 
tors feared were lost entirely’. 

Standard emergency preparations do not 
account for taxed and terrified minds, and 
tendencies to make poor decisions. The 
strength of the people in our team came 
from skills unrelated to their day jobs. But 
previous experiences came in handy — one 
of us (M.A.R.) served in the US Army, and 
another of us (J.K.P.) is a licensed pilot. Such 
training enabled us to make good decisions 
under pressure. The US military has trained 
soldiers for appropriate responses to stress- 
ful conditions for decades. Pilot instruction 
teaches people to efficiently use complex, sys- 
tematic approaches in times of crisis to assess 
risks, instruct crews and manage stress. 

These kinds of skills are not taught in 
typical disaster training”. They should be. 
In the months after the flood waters ebbed, 
we created a series of tactical decision games 
to improve employees’ abilities to lead 
responses and to assess and communicate 
situations. These go far beyond paper plans. 


SIMULATED STRESS 

Institutions typically develop plans while 
sitting around a conference table in a com- 
fortable room with refreshments. Practice ses- 
sions are run with the entire staff on a warm, 
sunny day. They do not prepare people to act 
decisively. When surveyed, most animal- 
resource staff at our facility said that they 


Asteel door pushed in by Hurricane Sandy at 
the NYU Langone Medical Center. 


would attempt to contact upper management 
before doing anything in an emergency. But 
ifleaders are absent during crises, indecision 
could cause destructive delays. The best- 
written disaster plan is not worth the paper 
it is printed on without people on site able to 
execute it. 

Training can shape contingency plans to 
account for particular personalities. During 
our emergency response, we saw clear apti- 
tudes in two people who were employed as 
cage-washers, an entry-level position. In 
the days following Sandy, these two never 
hesitated. They invigorated and motivated 
their teams, and stayed positive and focused. 
Leading by example, they encouraged higher- 
ranking personnel to keep working, even 
when tired and hungry. They were integral 
members of our impromptu rescue team, 
evaluating the situation and making deci- 
sions. They determined the route to transport 
rescued animals to another NYU Langone 
facility, a trek that spanned several floors and 
buildings without power. Had we known of 
their capabilities earlier, we might have made 
them leaders of small teams from the start of 
the emergency. 

Training can empower personnel to feel 
competent to step into leadership voids. 
Too many training programmes look for 
responses guided by standard operating 
procedures, with predetermined ‘correct’ 
answers. By contrast, tactical decision games 
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simulate stressful, challenging situations and 
require participants to make choices without 
full information or clearly correct answers”. 
Exercises that were initially developed for 
military use have been adapted for civilians 
by industrial psychologists such as Margaret 
Crichton. Such training is becoming standard 
in aviation, nuclear-power plants and medi- 
cine’. After coping with Sandy, our team cre- 
ated a programme for animal-care facilities. 


DECISION TIME 

Our exercises put trainees on their mettle. 
We allow too little time for people to assess 
disaster scenarios. Instead of analysing the 
situation, we demand that trainees state their 
next actions; we force them to make deci- 
sions on the spot. When a trainee says that he 
would call for help, we hand over a phone to 
hear what he would say. Then we ask what he 
will do when emergency responders or lead- 
ers reply that they are not coming. Everyone 
gets a turn in the ‘hot seat’ with individuals 
scrutinized both by the trainer and by their 
fellow participants. In our classes, distrac- 
tions such as air horns, darkness and flashing 
lights increase stress. 

People may not enjoy these exercises, 
but they do see the value. Working through 
shifting scenarios allows trainees to become 
confident in their thought processes and 
abilities. When people are reduced to bum- 
bling responses, this is framed as an area to 
improve. In our exercises, much as in a real 
disaster, a right answer is not essential; mak- 
ing a decision is. 

Such training can be easily incorporated 
into other team-building exercises in a busy 
work environment. We conduct disaster 
training annually. Stafflearn crucial lessons 
about how to lead in a crisis. Even if the skills 
developed are never put to use in a real dis- 
aster, facilities, people and animals are still 
likely to benefit. m 
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Mary Somerville helped to unify science intellectually at a time of specialization. 


IN RETROSPECT 


On the Connexion of 
the Physical Sciences 


Richard Holmes finds Mary Somerville’s breakthrough 
science best-seller thrillingly fresh, 180 years on. 


ess than two centuries ago, popu- 
L* science barely existed. In 1830, 

astronomer John Herschel wrote 
to natural philosopher William Whewell 
about the urgent need for “digests of what 
is actually known in each particular branch 
of science ... to give a connected view of 
what has been done, and what remains to 
be accomplished”. 


The remarkable writer who first achieved 
that “connected view” and arguably 
launched popular science writing was a 
self-taught Scottish mathematician, Mary 
Fairfax Somerville (1780-1872). Her book, 
On the Connexion of the Physical Sciences, 
published by John Murray in 1834 along- 
side works by Walter Scott, Lord Byron 
and Jane Austen, contains no equations, 
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On the Connexion 
of the Physical 
Sciences 

MARY SOMERVILLE 


John Murray: 1834. 


few diagrams and lit- 
tle mathematics. But 
it is a masterpiece of 
descriptive explana- 
tion and analogy that 
unveils a complete scientific world view, 
covering everything from stars to insects. It 
was Murray’s best-selling scientific publica- 
tion until Charles Darwin’s On the Origin 
of Species in 1859; it eventually ran to ten 
editions in Britain, and was published in 
France, Italy, Germany and the United 
States. 

The book appeared at a critical moment 
for science. The disciplines were beginning 
to define their territories and societies were 
starting to coalesce, including the Geo- 
logical Society in 1807 and the Zoological 
Society of London in 1826. For Whewell, 
who scrutinized Somerville’s offering in 
depth in the Quarterly Review of March 
1834, the work was a “masterly” survey 
that performed the crucial task of intellec- 
tual unification at a moment when science 
threatened to become like “a great empire 
falling to pieces”. 

Somerville, the daughter of admiral 
William Fairfax, grew up near Scotland’s 
Firth of Forth wandering the seashore, col- 
lecting shells and studying seabirds. Her 
father described her as a savage. At 15 she 
saw a mysterious reference to algebra in 
a women’s fashion magazine, and began 
to devour the theorems of Euclid and 
Newton. She would lie in bed at night, lis- 
tening to the sea and solving equations in 
her head. 

She was launched into Edinburgh soci- 
ety at 18, a notable beauty; to her parents’ 
dismay she continued her study of maths, 
as well as painting and the piano, describ- 
ing herself as “intensely ambitious to excel 
in something, for I felt in my own breast that 
women were capable of taking a higher place 
in creation than that assigned to them in my 
early days”. A disastrous first marriage to 
glamorous naval attaché Samuel Greig was 
cut short by his early death in 1807. 

In 1812 she met and married her soul- 
mate and cousin, the scientifically minded, 
globe-trotting physician William Somer- 
ville. Moving to London in 1816, and 
now the parents of four children, the cou- 
ple entertained leading scientists such as 
William Herschel, Michael Faraday, Charles 
Babbage and Charles Lyell. In 1830, at the 
invitation of the Society for the Diffusion 
of Useful Knowledge, she translated French 
astronomer Pierre-Simon Laplace's highly 
technical The Mechanism of the Heavens. In 
a parliamentary debate on scientific educa- 
tion, she was referred to as “one of the only 
six persons in England who understands 
Laplace”. 

Somerville began writing On the Con- 
nexion of the Physical Sciences in 1832, 
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during a long visit to Paris. She effectively 
became an expert reporter on the latest 
developments in European science. Tak- 
ing full advantage of social networking, she 
contacted Laplace’s influential widow and 
dined with the physicists Fran¢ois Arago, 
Jean-Baptiste Biot and Joseph- 
Louis Gay-Lussac. She had 
privileged status at sites from 
the Paris Observatory to the 
National Museum of Natural 
History, and in the laborato- 
ries of electrical-theory pio- 
neers André-Marie Ampére 
and Antoine César Becquerel. 

In contrast to the vague 
speculations of eighteenth- 
century natural philosophy, 
her 500-page book covers a 
tight field of hard sciences — 
astronomy, physics, chemis- 
try, geography, meteorology 
and electromagnetism. Its 
groundbreaking style, clear 
and logical, occasionally 
opens out into passages of 
sublime perspective, such as 
the description of universal 
gravity as a force equally pre- 
sent “in the descent ofa rain 
drop as in the falls of Niagara; 
in the weight of the air, as in 
the periods of the moon’. 
Somerville ranges over sub- 
jects from stellar parallax to 
terrestrial magnetism, from 
comets to giant seaweed. 

Her handling of acoustics 
is characteristically brilliant, 
based on the observations 
of John Herschel, Arago and 
naturalist Alexander von 
Humboldt. Comparing the 
propagation of sound to “a 
field of corn agitated by a gust of wind’, 
she goes on to describe phenomena from 
birdsong to thunder. She also suggests a 
connection between waves propagated in 
water, the atmosphere and sunlight, writ- 
ing: “Any one who has observed the reflec- 
tion of the waves from a wall on the side of 
a river ... after the passage of a steam-boat, 
will have a perfect idea of the reflection of 
sound and of light.” 

Her exploration of the solar spectrum 
contains one of the earliest descriptions 
(derived from work by William Herschel, 
chemist William Hyde Wollaston and 
physicist Johann Wilhelm Ritter) of infra- 
red and ultraviolet rays at the extreme 
ends of the known light spectrum, “too 
extensive in their undulations to affect our 
optic nerves”. She speculates that such rays 
might have many possible functions in the 
animal kingdom: “We are altogether igno- 
rant of the perceptions which direct the 


carrier-pigeon to his home... or of those 
in the antennae of insects which warn 
them of the approach of danger”. She also 
mused about climate change, the cause of 
earthquakes and the existence of planets 
beyond Uranus. 


French scientists making high-altitude measurements in 1804. 


The most original sections deal with elec- 
tricity and the new science of electromag- 
netism. Somerville thrillingly describes 
Faraday’s latest work with the horseshoe 
magnetic generator, establishing that mag- 
netism and electricity must have complex 
links in what he was beginning to define 
as ‘fields. These sections clearly predict the 
connection between all electromagnetic 
phenomena, established in four equations 
a generation later by physicist James Clerk 
Maxwell (see nature.com/maxwell). 

Somerville’s work contextualized the 
sciences as an ongoing global project. The 
book emphasized, in a wholly new way, 
the communal nature of science as shared 

discovery, referring 
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(pictured), the geographical explorations 
of Lyell and Humboldt, and the teams of 
European astronomers who observed the 
return of Halley’s comet, among other feats. 

On the Connexion of the Physical 
Sciences was widely praised by journal- 
ists and scientists in Britain 

and abroad; both Arago and 
Humboldt deeply admired 
it. The popular, large-cir- 
culation journal Mechanics’ 
Magazine urged its audience: 
“read it! read it!” Somerville 
dined at the male stronghold 
of the University of Cam- 
bridge, invited by its science 
professors; received honor- 
ary membership of the Royal 
Astronomical Society among 
others; and, although barred 
from the Royal Society, is 
commemorated there in a 
formidable marble bust. 

Such was her celebrity that 

she wrote, “Iam a kind of tame 
Lioness at present.” Her friend 
the novelist Maria Edgeworth, 
however, noted that “while her 
head is up among the stars, her 
feet are firm upon the earth”. 
Somerville, privately unor- 
thodox and witty, sceptical 
but still believing in a creator, 
lived up to that estimation 
throughout her long life. She 
supported women’s suffrage 
(her signature was the first on 
philosopher John Stuart Mill's 
petition to Parliament); cam- 
paigned against vivisection, 
and against slavery in America; 
and believed in Darwinian 
evolution. 

Like the great poets of her 
era, Somerville brought a new vision into the 
world, and one that a broad, educated pub- 
lic could grasp. Seven years after her death, 
a new womens college at the University of 
Oxford was named in her honour. 

Looking back almost 40 years after the 
publication of her magnum opus, Max- 
well reflected: “It was one of those sug- 
gestive books which put into definitive, 
intelligible and communicative form the 
guiding ideas that are already working 
in the minds of men of science”. In fact, 
the book prompted the creation of a new 
professional concept, and a new umbrella 
word to define it, coined by Whewell in his 
review of 1834: “scientist”. = 
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Warming goal: still 
the best indicator 


David Victor and Charles Kennel 
challenge the practice of using 
global mean temperature as the 
main measure of danger from 
climate change (Nature 514, 
30-31; 2014). On the basis of 
40 years of science and policy 
research, there are good reasons 
why this temperature is the 
favoured indicator. 

It can be related through 
climate models to the regional 
impacts and risks that drive 
public concern (see go.nature. 
com/5chktj). It is indeed 
“related only probabilistically 
to emissions’, but the authors’ 
best indicator — carbon dioxide 
concentration — is related only 
probabilistically to impacts and 
risks, except in the case of ocean 
acidification. As for ocean heat 
content, its trend experiences 
interruptions much like the global 
mean temperature, and bears 
no direct relationship to most 
impacts and risks. 

Compared with other 
proposals, global mean 
temperature is more closely 
related to outcomes for people 
and ecosystems. Without such 
a goal, we shall never know how 
much reduction in emissions is 
sufficient. 

Michael Oppenheimer Princeton 
University, New Jersey, USA. 
omichael@princeton.edu 


Warming goal: clear 
link to emissions 


David Victor and Charles Kennel 
argue that aiming to keep average 
global warming within 2°C of 
pre-industrial temperatures 
is neither politically nor 
scientifically useful (Nature 514, 
30-31; 2014). I disagree: global 
temperature change is the closest 
thing we have to a metric with 
aclear link to emissions; it can 
also be related quantitatively to a 
range of local climate impacts. 
Because global temperature 
seems to respond linearly to 
cumulative emissions of carbon 


dioxide (H. D. Matthews et al. 
Nature 459, 829-832; 2009), 
policies to cut emissions should 
also reduce global temperature 
change. This offers a simple 
framework for estimating 

a global carbon budget that 
contains warming to within 2 °C. 

Policy goals should not have 
adverse effects on human and 
environmental welfare. Using 
global temperature avoids 
these too, because it seems to 
be an indicator of the extent of 
local climate changes (see, for 
example, M. Markovic et al. Clim. 
Change 120, 197-210; 2013). 
Furthermore, the average global 
temperature over decades relates 
well to many climate impacts and 
to Victor and Kennel’s ‘vital signs’ 
of planetary health (National 
Research Council Climate 
Stabilization Targets National 
Academies Press, 2011). 

Now that the international 
community has finally coalesced 
around the 2°C goal, compelling 
reasons are needed to interrupt 
this momentum. 

H. Damon Matthews Concordia 
University, Montreal, Canada. 
damon.matthews@concordia.ca 


Open access to Earth 
land-cover map 


China last month donated to 
the United Nations the first 
open-access, high-resolution 
map of Earth’s land cover, as a 
contribution towards global 
sustainable development and 
combating climate change. 

The map, known as 
GlobeLand30, comprises data sets 
collected at 30-metre resolution 
— more than ten times that of 
previous data sets. These data sets 
will be valuable for monitoring 
environmental changes and for 
resource management at global, 
regional and local scales (see also 
M. A. Wulder and N. C. Coops 
Nature 513, 30-31; 2014). 

The GlobeLand30 data sets 
are freely available and comprise 
ten types of land cover, including 
forests, artificial surfaces and 
wetlands, for the years 2000 and 
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2010. They were extracted from 
more than 20,000 Landsat and 
Chinese HJ-1 satellite images (see 
www.globallandcover.com). 
GlobeLand30 will promote 
scientific data sharing in the 
fields of Earth observation and 
geospatial sciences. 
Chen Jun National Geomatics 
Center of China, Beijing, China. 
chenjun@nsdi.gov.cn 
Yifang Ban KTH Royal Institute 
of Technology, Stockholm, Sweden. 
Songnian Li Ryerson University, 
Toronto, Ontario, Canada. 


Sustainability: root 
targets in consensus 


Mark Stafford-Smith 

urges scientists to engage 

more effectively with the 
United Nations’ Sustainable 
Development Goals to ensure 
that their environmental targets 
are quantifiable (Nature 513, 281; 
2014). When stakeholder values 
are diverse and passionately 
defended, however, such targets 
may not be easily agreed — 
leading to stalled negotiations 
and stagnant progress on issues 
of global significance. 

In our view, building consensus 
over desirable environmental 
outcomes would be a better 
approach. This involves analysing 
different possible outcomes, 
understanding decision-making 
processes and improving 
communication among 
stakeholders who have conflicting 
interests. 

Initiatives such as Future 
Earth and the Intergovernmental 
Platform on Biodiversity and 
Ecosystem Services are helping 
scientists to engage with 
international environmental 
policy. In the ongoing 
negotiations over the Sustainable 
Development Goals, scientists 
need to move on from simple 
information provision and help 
to develop appropriate policies. 
Sean Maxwell* University of 
Queensland, Brisbane, Australia. 
smaxwell@uq.edu.au 
*On behalf of 10 correspondents (see 
go.nature.com/tqxjyj for full list). 


Stop the cuts, not 
the evaluations 


Amaya Moro-Martin asserts that 
the European Science Foundation 
(ESF) supported a “flawed 
evaluation process” for research in 
Portugal (Nature 514, 141; 2014). 
This unsubstantiated allegation 
undermines the foundation's 
work and is detrimental to the 
many excellent reviewers and 
panel members involved in the 
evaluation process. 

The ESF champions the 
benefits to society from 
investments in research. We 
are very concerned about the 
increased pressure on many 
national science budgets. 
However, we believe that peer 
review, despite its limitations, 
is the most meritocratic and 
evidence-based approach to 
resource allocation. The work of 
those public-spirited scientists 
willing to give their time and 
energy to the peer-review 
process must be acknowledged, 
respected and supported. They 
should be allowed to undertake 
their work without interference. 

During the course of the 
independent research evaluation 
implemented for the Foundation 
for Science and Technology in 
Portugal, the ESF has witnessed 
an unprecedented level of direct 
interference with peers and panel 
members in the performance 
of their work. Even while the 
review process is ongoing, many 
have received intimidating 
communications designed 
to discourage them from 
completing their agreed tasks. 
This practice is unacceptable and 
damaging to science. 

It is in this context that we 
respond to Moro-Martin’s 
remark. Although no evaluation 
process is perfect, it is the most 
independent system yet devised. 
The ESF has carried out this 
evaluation project in accordance 
with good practice (see go.nature. 
com/o4xfuz; to be updated on 
completion of the project). 
Jean-Claude Worms, Jane Swift 
ESE Strasbourg, France. 


jswift@esf.org 


BRIEF COMMUNICATIONS ARISING 


The ‘mitoflash’ probe cpYFP does not respond 


to superoxide 


ARISING FROM E.-Z. Shen et al. Nature 508, 128-132 (2014); doi:10.1038/nature13012 


Ageing and lifespan of organisms are determined by complicated inter- 
actions between their genetics and the environment, but the cellular 
mechanisms remain controversial; several studies suggest that cellular 
energy metabolism and free radical dynamics affect lifespan, implicating 
mitochondrial function. Recently, Shen et al.’ provided apparent mech- 
anistic insight by reporting that mitochondrial oscillations of ‘free radical 
production’, called ‘mitoflashes’, in the pharynx of three-day old Caenor- 
habditis elegans correlated inversely with lifespan. The interpretation of 
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Figure 1 | Spectroscopic and structural analysis of the ‘mitoflash’ probe 
cpYFP. a, b, cpYFP fluorescence excitation spectra (emission at 515 nm) 

(a) and emission spectra (excitation at 488 nm) (b) after addition of xanthine 
(X; 2 mM), xanthine oxidase (XO; 100 mU ml), bovine Cu/Zn superoxide 
dismutase (SOD; 600 U ml” ') and KOH (solvent control for xanthine; note pH 
increase), and the in situ pH of the assayed 200 ll reaction mix. a.u., arbitrary 
units. c, Cytochrome c reduction detected by absorption at 550 nm to measure 
superoxide generation in response to the X/XO system in the presence and 
absence of SOD, and in response to KOH as a solvent control for X. 
Cytochrome c 100 11M; arrow indicates X or KOH addition. d, The response of 
cpYFP excitation ratio (488/405 nm) to superoxide generation. Arrow 
indicates introduction of X to constitute the X/XO system. Controls contained 
either SOD addition or no XO. e, The same assays were performed with KOH 
introduction as solvent control for X. f, g, cpYFP fluorescence excitation 
spectra (f) and emission spectra (g) after 24h incubation with dithiothreitol 
(DTT; 10 mM) and 2,2’-dipyridyl disulphide (DPS; 1 mM), and the in situ pH 
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mitoflashes as ‘bursts of superoxide radicals’ assumes that circularly per- 
muted yellow fluorescent protein (cpYFP) is a reliable indicator of mito- 
chondrial superoxide’, but this interpretation has been criticized because 
experiments and theoretical considerations both show that changes in 
cpYFP fluorescence are due to alterations in pH, not superoxide*’. Here 
we show that purified cpYFP is completely unresponsive to superoxide, 
and that mitoflashes do not reflect superoxide generation or provide a 
link between mitochondrial free radical dynamics and lifespan. There is 
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of the assayed 200 pl reaction mix. h, cpYFP fluorescence ratio after DTT and 
DPS addition over 3 h; arrow indicates DTT or DPS addition. i, Cross-sections 
through a surface model of cpYFP (left and middle). Chromophore (chr) and 
cysteine residues are represented as ball-and-stick models. The Cys 171 thiol is 
relatively close to the protein surface, but unlikely to be accessible to solutes as 
indicated by the docking of a DTT molecule to the protein surface (right). 

j, k, cpYFP fluorescence excitation spectra (emission at 515 nm) (j) and 
emission spectra (excitation at 488 nm) (k) in response to pH as determined 
in situ after the measurements. l, pH dependence of cpYFP excitation ratio 
(488/405 nm; normalized to pH 7.0) as compared to the cpYFP part of the pH 
sensor SypHer (ASypHer). m, Sectional views through volume models of 
cpYFP and redox-sensitive GFP (roGFP2), with the clipping plane parallel to 
and just above the chromophore phenoxy ring. The chromophore is 
represented as a ball-and-stick model. Data in a-h and j-l are background 
corrected, and experiments were repeated at least five times with consistent 
results. 
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a Reply to this Brief Communication Arising by Cheng, H. et al. Nature 
514, http://dx.doi.org/10.1038/nature13859 (2014). 

We carried out experiments with purified recombinant cpYFP sensor 
protein to test whether it responds to superoxide (Fig. la-e). Exposure 
of cpYFP to a superoxide-generating system (xanthine (X) and xanthine 
oxidase (XO)) slightly changed the excitation and emission spectra. How- 
ever, the same change occurred when cpYFP was incubated with the in- 
dividual assay constituents in the absence of superoxide production, or 
when Cu/Zn superoxide dismutase (SOD) was added to degrade super- 
oxide (Fig. la, b). The cytochrome c reduction assay confirmed that 
superoxide is produced by the X/XO system, and is abolished by SOD 
(Fig. 1c). Xanthine is dissolved in potassium hydroxide, causing a small 
increase in pH after addition. There was an excellent correlation between 
spectral changes and resulting assay pH after xanthine (that is, potassium 
hydroxide) addition (Fig. 1a, b). In time course assays in which superoxide 
generation was started by the addition of xanthine (Fig. 1d), the addition of 
potassium hydroxide as the solvent control for xanthine (Fig. le) gave the 
same increase in fluorescence ratio. Extended reductive or oxidative treat- 
ment with thiol redox agents (the reducing agent dithiothreitol (DTT) and 
oxidizing agent 2,2’-dipyridyl disulphide (DPS)) did not alter the spectral 
behaviour (Fig. 1f-h), consistent with structural information suggesting 
that both Cys residues are buried inside the mature protein and are 
unlikely to be accessible for thiol redox chemistry (Fig. 1i). Likewise, 
reductive pre-treatment with DTT under inert atmosphere, followed 
by DTT removal, did not affect the outcome of the superoxide assays. 
Further variation of experimental variables, including pre-incubation 
conditions, pH buffer systems and a 100-fold range of sensor concentra- 
tions, did not lead to any rapid, reversible change in cpYFP sensor signal 
required for superoxide-related mitoflashes, as long as the pH and halide 
ion concentrations were kept constant. 

Mitoflashes can be fully explained by the extraordinary pH sensitivity 
of cpYFP, which has a pK, value of ~8.7 (determined by measuring fluo- 
rescence after excitation at 488 nm, the wavelength at which flashes are 
observed) and shows a >50-fold change in fluorescence ratio between 
pH7 and 10, similar to the structurally related ratiometric pH-sensor 
SypHer* (Fig. 1j-1). In the mitochondrial matrix, a resting pH (~7.9) close 
to sensor pK, and a limited pH buffering capacity mean that even minor 
perturbations will elicit a pronounced sensor response (Fig. 1a, b, d, e). 
The cpYFP pH sensitivity is due to the structural perturbation caused 
by the circular permutation. A large cleft in the B-barrel exposes the 
pH_-active phenoxy group of the chromophore (Fig. 1m, left), which is 
concealed in non-permuted green fluorescent protein (GFP)-based bio- 
sensors (Fig. 1m, right). 

On the basis of this evidence using purified cp YFP and earlier studies 
in cells and isolated mitochondria*®”’, the mitoflash phenomenon can- 
not be attributed to bursts of mitochondrial superoxide. In accordance 
with the pH responsiveness of the probe, recent work with different 
sensors suggests that mitoflash events indicate brief periods of alkalini- 
sation in individual mitochondria, possibly as a result of acceleration in 
proton pumping, triggered by mitochondrial fusion initiation and/or a 
change in ion homeostasis®*"®. 

The debate about the nature of mitoflashes has focused on in situ 
evidence that has left space for interpretation on both sides. Critics have 
pointed out the implausibility of “superoxide flashes’ on the basis of mito- 
chondrial energetics® °° the absence ofa plausible chemical mechanism 
for the reversible interaction between cpYFP and superoxide*’, and the 
fact that the pH sensor SypHer also detects mitoflashes®”°. These argu- 
ments have been countered by data suggesting a correlation of mitoflashes 
with the response of chemical probes for reactive oxygen species'’"”’, the 
notion that the pH probe SypHer may also respond to superoxide“, and 
the suggestion that a mitoflash represents a mixture of superoxide burst 
and pH transient'’’. Ultimate resolution of the debate has been ham- 
pered by the use of different biological systems and the complexity of 


mitochondrial physiology, where matrix pH and free radical release are 
connected by the electron transport chain and linked to several other 
parameters such as availability of respiratory substrates, membrane poten- 
tial, redox and ion homeostasis, and mitochondrial morphology". 
Here we resolve the controversy by a thorough analysis of the fundamental 
properties of the mitoflash sensor cpYFP. Previous work already excluded 
the suggestion that the pH probe SypHer responds to superoxide’. We 
now provide definitive evidence that cpYFP itself does not respond to 
superoxide and that flashes recorded by cpYFP do not represent super- 
oxide bursts. Of course, sudden changes in mitochondrial physiology 
may still include altered free radical levels. Although the mitoflash phe- 
nomenon may reflect an important feature of mitochondrial function 
that deserves further mechanistic analysis, the interpretation of the events 
by Shen et al. lacks a biophysical foundation and mitoflashes cannot serve 
as evidence for free radical involvement in determining lifespan. 


Methods 

cpYFP was purified from Escherichia coli Origami (DE3) and Rosetta 2 (DE3) 24h 
after induction at 20 °C and assayed at 10, 25 and 1,000 pg ml 7 using a Jasco spec- 
trofluorimeter FP8300 and a BMG Labtech Clariostar plate reader. Detector gain 
was adjusted for individual experiments. Buffers contained 100 mM NaCl, 1 mM 
Na EDTA and 100 mM Tris-HCl, pH 7.5 (for thiol redox and superoxide assays; 
degassed and under argon for thiol redox treatments) or 100 mM Tris-TES (for pH 
assays). All reagents were dissolved in assay buffer, except for xanthine (100X stock 
in 1 M KOH, base required for solubility) and xanthine oxidase (118 (NH4)2SO4 
suspension as delivered by Sigma). Protein structures (PDB entries 3078 and 1JC1) 
were rendered using PyMOL. 
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REPLYING TO M. Schwarzlander et al. Nature 514, http://dx.doi.org/10.1038/nature13858 (2014) 


In the accompanying Comment’, Schwarzlander et al. challenged our 
recent study’ because they failed to reproduce our previous finding that 
the fluorescence intensity of purified circularly permuted yellow fluor- 
escent protein (cpYFP) increases in response to oxygen and superoxide 
anions produced by xanthine (X) plus xanthine oxidase (XO)’. Starting 
from a ‘fully reduced’ state (incubation with 10 mM dithiothreitol for 
>3h) and in the presence of 75 mM HEPES, we demonstrated that 
cpYFP exhibits a twofold fluorescence increase after oxygenation, and 
an additional twofold increase after the subsequent addition of X plus 
XO, which could not be accounted for by solvent (potassium hydrox- 
ide)-induced alkalization. Furthermore, the xanthine plus xanthine 
oxidase-induced increase in cpYFP fluorescence was reversed by Cu/Zn 
superoxide dismutase (600 U ml‘). We also found that the fluores- 
cence intensity of fully reduced cpYFP increased >fourfold after incu- 
bation with 1 mM aldrithiol. Notably, recombinant cpYFP purified in 
the absence of dithiothreitol treatment exhibits a high fluorescence com- 
parable to that of the fully oxidized state, indicating the high suscept- 
ibility of cp YFP to oxidation in non-reducing environments’. Therefore, 
ensuring a fully reduced state of cpYFP is essential for the probe to sense 
superoxide in vitro. This property is probably the reason that the probe 
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functions readily as a reversible superoxide biosensor when targeted to 
the reduced environment of the mitochondrial matrix. Unfortunately, 
from the brief description of the methods and limited data provided by 
Schwarzlander et al.', it is not possible to determine whether cpYFP 
was fully reduced in their experiments, or whether sufficient precau- 
tions were taken to prevent oxidation of the probe. Moreover, in our 
experiments cpYFP was expressed in Escherichia coli BL21(DE3)LysS 
cells, whereas Schwarzlander et al.’ used E. coli Origami, a trxB (thiored- 
oxin reductase) mutant strain that also lacks glutathione reductase needed 
to fully limit cysteine oxidation’, which could result in an increased 
oxidative status of their purified cpYFP rendering it non-responsive to 
superoxide. 

Our data from intact cells demonstrate that, in addition to increasing 
mitoflash frequency, aldrithiol and menadione application also mark- 
edly increases basal cpYFP fluorescence intensity within the mitochon- 
drial matrix”*. In addition, nanomolar concentrations of nigericin, a 
H‘/Kt antiporter, stimulates mitoflash activity®. These responses of 
cpYFP in situ are unlikely to be attributable to mitochondrial alkaliza- 
tion. We also found that the temporal profile of mitoflash events do not 
always mirror the change of the mitochondrial membrane potential in 
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cardiac and skeletal muscle cells®’, and this contradicts the suggestion 
that mitoflashes simply reflect increased proton pumping in response 
to membrane potential depolarization® ”°. 

Until now, structural information about how cpYFP senses superoxide 
remains a mystery. In a unique class of enhanced green fluorescent pro- 
tein (eGFP)-based calcium sensors, a reversible deprotonization of the 
chromophore occurs owing to calcium binding to a negatively charged 
site on the probe’’. We are investigating whether a similar mechanism 
might underlie the reversible superoxide-sensing chemistry of cpYFP. 

Despite the technical issue raised by Schwarzlander et al.', the exis- 
tence of bursts of superoxide or reactive oxygen species (ROS) produc- 
tion in respiring mitochondria has been confirmed by several independent 
investigators using different probes. Two pH-insensitive, ROS probes, 
MitoSOX and 2’,7’-dichlorodihydrofluorescein diacetate, have vali- 
dated ROS increases during cp YFP-reported ‘flashes’*’*'*. When used 
individually to avoid possible fluorescence resonance energy transfer 
(FRET) effects and spectral cross-contamination, these pH-insensitive 
ROS sensors confirmed flash events of nearly identical frequency and 
spatiotemporal properties as that observed for cpYFP”*. Quantification 
of the respective contributions of superoxide and pH to mitoflash events 
showed that a predominant superoxide component is coincident with 
a modest alkalization of the mitochondrial matrix in muscle cells®. Sim- 
ilarly, a previous report used MitoSOX to confirm bursts of superoxide 
during pH alkalization events in primary astrocytes transfected with 
the fluorescent pH-sensor mitoSypHer™. A recent report, which is co- 
authored by two authors of the accompanying Comment by Schwarz- 
lander et al.', used roGFP2 to detect spontaneous, short-lived oxidative 
bursts that are accompanied by mitochondrial depolarization, transient 
matrix alkalization, and reversible mitochondrial ‘contractions’ », all of 
which we previously documented for cpYFP mitoflashes. Furthermore, 
in many cell types and tissues**'*’*"®'” and even in living animals’, 
mitoflash activity is increased by oxidants (including menadione and 
paraquat) and reduced by antioxidants (including mitoTEMPO and 
SS31). Nevertheless, given the extreme diversity and plasticity of the 
mitochondria proteome’, the relative contributions of superoxide and 
pH to cpYFP-reported mitoflash events may vary in a species-, cell-type- 
and context-dependent manner. 

It has become increasingly appreciated that mitoflash activity is a 
complex phenomenon, comprising multifaceted and intertwined mito- 
chondrial processes including quantal superoxide production, membrane 
depolarization, membrane permeability transition, NADH depletion, 
matrix alkalization and swelling that masquerades as mitochondrial 
contraction®*"*!>'”!°, Ample evidence supports the notion that mito- 
flash activity serves as a novel and universal “frequency-coded optical 
readout reflecting free-radical production and energy metabolism at the 
single- mitochondrion level”’. The continuing debate on what drives, 
controls and contributes to these events does not change the fact that 
mitoflashes reflect a fundamental physiological phenomenon linked to 
energy metabolism and stress response, nor does it discount the sig- 
nificance of our finding that mitoflash frequency predicts lifespan at a 
very early age in Caenorhabditis elegans’. 

The Comment by Schwarzlander et al.' focuses exclusively on the 
controversy of cpYFP as a superoxide sensing probe, which was origi- 
nally demonstrated in several publications by Wang, Dirksen, Sheu and 
Cheng, and therefore these authors were invited to respond to the Com- 
ment. M.-Q. Dong and 11 authors of the original paper? were not in- 
volved in the research that led to the discovery of cpYFP as a superoxide 


probe, so are not listed as authors (M.-Q. Dong was included in this 
Reply as a corresponding author). 
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Tryptophan catabolism is unaffected in 
chronic granulomatous disease 


ARISING FROM L. Romani et a/. Nature 451, 211-215 (2008); doi:10.1038/nature0647 1 


Chronic granulomatous disease (CGD) is an inherited disorder of phago- 
cyte function, caused by a genetic defect in NADPH oxidase (NOX2), 
leading to an impaired ability of leukocytes to produce superoxide 
(O.~*)'; CGD subjects are susceptible to chronic infections and hyper- 
inflammation, although the mechanisms remain unclear. Romani et al.” 
reported an aberrant inflammatory response to pulmonary aspergillosis 
as wellas sterile Aspergillus fumigatus to be mediated by a defective tryp- 
tophan catabolism to kynurenine caused by lack of O, “in CGD mice. 
Kynurenine is formed by indoleamine 2,3-dioxygenase-1 (IDO1) ina 
reaction originally reported to depend on O,~ (ref. 3). Here we show that 
NOX2 deficiency does not attenuate IDO1-mediated tryptophan cata- 
bolism in human phagocytes and CGD mice with granulomas arising 
from an inflammatory response to Aspergillus. There is a Reply to this 
Brief Communications Arising by Romani, L. & Puccetti, P. Nature 514, 
http://dx.doi.org/10.1038/nature13845 (2014). 

Romani et al.? concluded that IDO-mediated tryptophan catabo- 
lism is blocked in CGD based on studies performed in p47?"°*’ mice. 
They reported increased kynurenine in granuloma-containing lungs 
of wild-type but not p47°"’" mice, and that interferon-y (IFN-y) 
stimulates IDO activity in lung phagocytes from wild-type but not 
p4zrex’- mice, despite the presence of IDO protein in both mouse strains. 
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However, recent studies have shown that cytochrome b; rather than 
O, “ activates cellular IDO*. Therefore, we re-examined tryptophan cata- 
bolism to kynurenine in several models of CGD, using Ncfl "(lacking 


Figure 1 | Phagocyte NADPH oxidase activity is not required for IDO1 
activity in inflammation. a, Lung PMN (purity > 85% by differential count) 
were isolated from wild-type (WT), Nef” or Cybb” mice 24h after intra- 
peritoneal injection of 7.5 mg per kg lipopolysaccharide 0111:B4 (LPS). PMN 
were then incubated for 48 h in RPMI medium containing 10% fetal bovine 
serum in the presence of 200 U ml! mouse recombinant IFN-y and 400 uM 
L-N°-monomethyl-arginine (nitric oxide synthase inhibitor). Consumption of 
tryptophan (ATrp, black) and accumulation of kynurenine (Kyn, grey) in the 
medium were then determined by high-performance liquid chromatography 
(HPLC) (mean = s.e.m., m = 3); insets show representative «-tubulin (50 kDa) 
and IDO1 (42 kDa) proteins. PMN from CGD mice failed to generate 0.“ upon 
stimulation with IFN-y (72 + 9 versus 0 + 0 nmol O,~ per h per mg protein 
for WT versus Nef ~ or Cybb”’ ~), mean + s.e.m., n = 3 (data not shown). 

b, Lungs from WT and Nef’ mice before (-) and 24h after intra-peritoneal 
administration of 7.5 mg per kg LPS (+) were homogenized in 1 ml of 20 mM 
phosphate buffer pH 7.2 containing 140 mM KCl and 2X complete protease 
inhibitor cocktail (Roche), and Kyn determined by HPLC; insets show 
representative o-tubulin (50 kDa) and IDO1 (42 kDa) proteins. Data are 

mean + s.e.m. of 3 mice in each group; *P < 0.05 (Mann-Whitney rank sum 
test). c, Representative lung sections (5-t1m thick) prepared from B10.Q and 
Nef1""”") mice before (day 0) and 2, 4 and 6 days after intra-nasal instillation of 
5 ug of sterile hyphal cell wall from A. fumigatus, and stained with haematoxylin 
and eosin. Scale bar, 100 jum. d, Lung Kyn (circles) and 3-hydroxykynurenine 
(3-OH-Kyn, squares) in B10.Q (WT, black symbols) and Nef1”"”" mice (white 
symbols) before (day 0) and 2, 4 and 6 days after instillation of sterile hyphal 
cell wall of A. fumigatus. Lungs were homogenized, proteins precipitated 

with 4% trichloroacetic acid, the mixture centrifuged and the resulting 
supernatant neutralised (1 M sodium phosphate, pH 7.4) and subjected to liquid 
chromatography with tandem mass spectrometry (LC/MS/MS) using m/z 
209-146 and 225-208 transitions for Kyn and 3-OH-Kyn, respectively. 

Kyn and 3-OH-Kyn were separated on a 150 X 4.6 mm Luna C18 (2), 5 um 
column (Phenomenex) using a gradient generated by mobile phase A (0.1% 
formic acid) and B (0.1% formic acid in 100% acetonitrile). The results show 
mean + s.e.m., with n = 7-8 for Kyn and n = 4 for 3-OH-Kyn for each time 
point and genotype. *P < 0.05 indicates significant difference between 
Nef”) and B10.Q using two-way ANOVA followed by Sidak’s multiple 
comparison test. e, Representative western blots of lung homogenates from 
B10.Q (WT) and Nef1”""”" mice 6 days after instillation of sterile hyphal cell 
wall from A. fumigatus, showing phosphorylated Stat1 (Statl pY701; 91 and 
84kDa), Statlo (91 kDa), «-tubulin (50 kDa) and IDO1 (42 kDa) proteins. 

f, Cells from bronchoalveolar lavage fluid of B10.Q (WT) and Nef1”"""/ mice 
4 days after instillation of A. fumigatus were stained by cell surface markers, fixed 
and incubated with anti-Stat1 antibody (BD Biosciences). Expression of Stat] in 
CD11b'Ly6g* PMN was assessed by flow cytometry and is shown as median 
fluorescence intensity (MFI). Data are shown for individual animals as well as 
mean + s.e.m. *P < 0.05 (Mann-Whitney rank sum test). g, h, THP-1 cells (10° 
per well) were treated for 24h with the respective control, CYBB-specific siRNA 
(5’-CGGAGGUCUUACUUUGAAGUCUUU-3’, Invitrogen) (g) or NCF1- 
specific siRNA (sc-29422; Santa Cruz Biotechnologies) (h), followed by 48 h 
incubation in the presence of 400 U ml“! recombinant human IFN-y. 

Trp lost from (ATrp, black) and Kyn accumulated in the medium (grey) was 
then determined by HPLC. Generation of 0,” (white) was determined 

by cytochrome c reduction after treatment of the cells with 200 ng ml phorbol- 
12-myristate-13-acetate for 1h. The 100% control-values for g and h were 

15 + 3.4 and 51 + 9nmol O,~ per h per mg protein, respectively 

(mean = s.e.m., n = 3); insets show representative «-tubulin (50 kDa) and IDO1 
(42 kDa) proteins. *P < 0.05 (Mann-Whitney rank sum test). 
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the Ncfl protein, also known as p47"), Cybb”'~ (lacking the catalytic 
subunit of NOX2, also known as gp91?""*)° and Nef1”""" mice (with a 
single mutation in the Ncf1 gene, resulting in a defective NCF1 protein 
leading to a lack of NOX2 activity)®”. 

Polymorphonuclear leukocytes (PMN) from lungs of endotoxin- 
treated Ncfl”” and Cybb’ mice failed to generate 0.” upon activa- 
tion, yet these cells, like wild-type cells, converted tryptophan to kynur- 
enine (Fig. 1a). Moreover, endotoxin treatment increased pulmonary 
kynurenine similarly in wild-type and Ncfl~~ mice (Fig. 1b). To examine 
tryptophan catabolism in situations of hyper-inflammation, we admi- 
nistered sterile Aspergillus fumigatus to Ncfl"""””"" and their wild-type 
control mice (B10.Q), differing at only a single Ncf1 mutation. As ex- 
pected, this caused pulmonary granulomas in B10.Q and Nef1’"!/”"! 
mice, and these granulomas resolved only in control animals (Fig. 1c). 
Strikingly, lung IDO] protein, kynurenine as well as 3-hydroxykynurenine 
were higher with defective NCF1 (Fig. 1d, e). This was associated with 
an increase in pulmonary and bronchoalveolar lavage fluid PMN Stat1 
protein (Fig. le, f), and lung phosphorylated Stat1 (Fig. le). Phosphor- 
ylation of Stat1 is a major NOX2 downstream pathway’* that mediates 
IFN-y-dependent IDO1 expression’. 

The above studies imply that O.~° is not required for in vivo IDO 
activity. Consistent with this, blood PMN isolated from CGD patients 
with a mutation in the NCFI or CYBB gene were unable to generate 
O,", yet effectively degraded tryptophan to kynurenine (data not shown), 
as also reported recently by others’”"’. Similarly, knockdown of NCF1 
or CYBB protein in human monocytic THP-1 cells did not decrease tryp- 
tophan catabolism to kynurenine, although it blunted O, ~~ formation by 
~80% (Fig. 1g, h). 

Our observation of increased, as opposed to decreased’, IDO activity 
and 3-hydroxykynurenine (an indicator of kynurenine metabolism) in 
infected CGD mice is consistent with studies reporting elevated plasma 
and urinary kynurenine and 3-hydroxykynurenine in CGD patients'”””. 
Also, gene therapy with CYBB in a CGD patient resulted in clearance of 
Aspergillus infection that was associated with restoration of 30% of nor- 
mal NOX activity without increase of IDO activity, as assessed by plas- 
ma kynurenine’’. The discrepancy between our cellular studies (Fig. 1a) 
and those of Romani et al.” may be explained in part by nitric oxide 
inhibiting IDO activity’, as we observed IDO activity only when nitric 
oxide synthases were blocked. Moreover, the Nef1’""”" mice used here’ 
better reflect human CGD in which NOX proteins are expressed, albeit 
as non-functional mutants, rather than being absent and with linked 
chromosomal fragment/s from the original 129 derived embryonic stem 
cell, which could also vary in different backcrossed strains, as in the 
case of the Nef1~~ mice used". 

We conclude that IDO1-mediated tryptophan catabolism to kynur- 
enine does not require phagocyte NADPH oxidase derived O, “, nor is 
it defective in human or mouse CGD, even under conditions where 
hyper-inflammation exists. Therefore, blockade of this pathway is unlikely 
to explain the acute pulmonary inflammatory response observed by 
Romani et al.’. 


Methods 


Lungs and lung PMN were isolated from mice and cells cultured as described 
previously”. Sterile Aspergillus fumigatus hyphal cell wall was administered intra- 
nasally to B10.Q and Nef1"""4 mice®. Human blood PMN were isolated from 
control or CGD patients by a Percoll gradient'®. Knockdown of NCF1 or CYBB in 
THP-1 cells was achieved by short interfering RNA (siRNA) transfection. Tryp- 
tophan, kynurenine and 3-hydroxykynurenine in the medium of IFN-y-treated 
cells and in lung homogenates were quantified by high-performance liquid chro- 
matography* and liquid chromatography-tandem mass spectrometry (LC/MS/ 
MS), respectively. Cellular NOX activity was determined by cytochrome c reduc- 
tion following stimulation with phorbol-12-myristate-13-acetate. 

All experiments involving animal and human subjects were performed with 
approval from the institutional ethics committees. We thank the CGD donors for 


their co-operation and donation of blood, and J. Ziegler for his help. We also 
thank E. Lonnblom, E. Chryssanthou and K. Selva Nandakumar for their help 
with the Aspergillus experiments. This work was supported by grants from the 
National Health and Medical Research Council of Australia (1003484 and 1037879 
to RS and 455395 to RS. and B.H.C.) and the European Commission Directorate- 
General for Research & Innovation (HEALTH-F2-2012-278611). 


Ghassan J. Maghzal*, Susann Winter’, Bettina Wurzer’, 

Beng H. Chong®, Rikard Holmdahl* & Roland Stocker? 

Vascular Biology Division, Victor Chang Cardiac Research Institute, 
Darlinghurst, New South Wales 2010, Australia. 

email: r.stocker@victorchang.edu.au 

School of Medicine, University of New South Wales, New South Wales 
2052, Australia. 

3School of Medical Sciences and Bosch Institute, The University of 
Sydney, Camperdown, New South Wales 2050, Australia. 
“Department of Medical Biochemistry and Biophysics, Karolinska 
Institutet, 171 77 Stockholm, Sweden. 

>Centre for Vascular Research, University of New South Wales, New South 
Wales 2052, Australia. 


Received 26 June; accepted 4 August 2014. 


1. Segal, B.H., Leto, T.L, Gallin,J.1., Malech, H.L.& Holland, S.M. Genetic, biochemical,and 
clinical features of chronic granulomatous disease. Medicine 79, 170-200 (2000). 

2. Romani, L. et al. Defective tryptophan catabolism underlies inflammation in 
mouse chronic granulomatous disease. Nature 451, 211-215 (2008). 

3. Hirata, F. & Hayaishi, O. Studies on indoleamine 2,3-dioxygenase. |. Superoxide 
anion as substrate. J. Biol. Chem. 250, 5960-5966 (1975). 

4. Maghzal, G. J., Thomas, S. R., Hunt, N. H. & Stocker, R. Cytochrome bs, not 
superoxide anion radical, is a major reductant of indoleamine 2,3-dioxygenase in 
human cells. J. Biol. Chem. 283, 12014-12025 (2008). 

5. Morgenstern, D.E., Gifford, M.A., Li, L.L., Doerschuk, C. M. & Dinauer, M.C. Absence 
of respiratory burst in X-linked chronic granulomatous disease mice leads to 
abnormalities in both host defense and inflammatory response to Aspergillus 
fumigatus. J. Exp. Med. 185, 207-218 (1997). 

6. Hultqvist, M. et a. Enhanced autoimmunity, arthritis, and encephalomyelitis in 
mice with a reduced oxidative burst due to a mutation in the Ncf1 gene. Proc. Nat! 
Acad. Sci. USA 101, 12646-12651 (2004). 

7. Sareila, O., Jaakkola, N., Olofsson, P., Kelkka, T. & Holmdahl, R. Identification of a 
region in p47phox/NCF1 crucial for phagocytic NADPH oxidase (NOX2) 
activation. J. Leukoc. Biol. 93, 427-435 (2013). 

8. Kelkka, T. et a/. Reactive oxygen species deficiency induces autoimmunity with 
type 1 interferon signature. Antioxid. Redox Signal. http://dx.doi.org/10.1089/ 
ars.2013.5828 (29 July 2014). 

9. Thomas, S. R. & Stocker, R. Redox reactions related to indoleamine 2,3- 

dioxygenase and tryptophan metabolism along the kynurenine pathway. Redox 

Rep. 4, 199-220 (1999). 

10. De Ravin, S. S. et al. Tryptophan/kynurenine metabolism in human leukocytes is 

independent of superoxide and is fully maintained in chronic granulomatous 

disease. Blood 116, 1755-1760 (2010). 

11. Jurgens, B., Fuchs, D., Reichenbach, J. & Heitger, A. Intact indoleamine 2,3- 

dioxygenase activity in human chronic granulomatous disease. Clin. Immunol. 

137, 1-4 (2010). 

12. Heeley, A. F., Heeley, M. E., Hardy, J. & Soothill, J. F. A disorder of tryptophan 

metabolism in chronic granulomatous disease. Arch. Dis. Child. 45, 485-490 (1970). 

13. Hakkim, A. et a/. Response: protecting against Aspergillus infection in CGD. Blood 

114, 3498 (2009). 

14. Thomas, S. R., Mohr, D. & Stocker, R. Nitric oxide inhibits indoleamine 2,3- 

dioxygenase activity in interferong-primed mononuclear phagocytes. J. Biol. 

Chem. 269, 14457-14464 (1994). 

15. Jackson, S.H., Gallin, J. |. & Holland, S. M. The p47?" mouse knock-out model of 

chronic granulomatous disease. J. Exp. Med. 182, 751-758 (1995). 

16. Stocker, R., Winterhalter, K. H. & Richter, C. Increased fluorescence polarization of 

1,6-diphenyl-1,3,5-hexatriene in the phorbol myristate acetate-stimulated 

plasma membrane of human neutrophils. FEBS Lett. 144, 199-203 (1982). 


Author Contributions G.J.M. designed and carried out most experiments. S.W. and 
R.H. designed and carried out studies involving A. fumigatus. B.W. carried out initial 
cellular studies and B.H.C. was responsible for studies involving CGD patients. R.S. 
conceived the study and wrote the manuscript with G.J.M. All authors read and 
contributed to the final version of the manuscript. 


Competing Financial Interests Declared none. 


doi:10.1038/nature13844 


23 OCTOBER 2014 | VOL 514 | NATURE | E17 


©2014 Macmillan Publishers Limited. All rights reserved 


BRIEF COMMUNICATIONS ARISING 


Romani & Puccetti reply 


REPLYING TO G. J. Maghzal et a/. Nature 514, http://dx.doi.org/10.1038/nature13844 (2014) 


After our initial observation of defective tryptophan catabolism in ex- 
perimental chronic granulomatous disease (CGD)', several laboratories 
have been testing the indoleamine 2,3-dioxygenase (IDO1) competence 
of cells from CGD patients. In most instances, they found no impair- 
ment in IDO1 competence in terms of tryptophan catabolic activity 
in vitro by polymorphonuclear leukocytes and monocyte-derived den- 
dritic cells*’, leading to the conclusion that there is no obvious defect in 
the production of kynurenine (the first by-product of tryptophan deg- 
radation)—hence in the IDO1-dependent mechanism of tolerogenesis 
as a whole in human CGD. In the accompanying Comment*, Maghzal 
et al. report that tryptophan catabolism is unaffected in chronic gran- 
ulomatous disease, again by measurements of kynurenine production. 

However, a number of studies have now been providing evidence 
that the assay is not sufficiently informative as to the local versus sys- 
temic functioning of the IDO1 mechanism, particularly on consider- 
ing the pleiotropic effects of IDO1 in vivo*®. While we stand by our 
original observations in CGD mice with lethal pulmonary aspergillosis’, 
it should be noted that, in all of the experimental models tested so far, 
lack of IDO1 competence does not result, per se, in spontaneous inflam- 
matory pathology. Yet, the functional defect becomes obvious when 
mice not competent for IDO1 function are challenged with an inflam- 
matory noxa recognized more usually by Toll-like receptors’. Addition- 
ally, a number of potential factors have now been identified that further 
substantiate the concept that a global defect in IDO1 functioning under- 
lies the severe chronic inflammation in CGD, among which are local 
accumulation of peroxynitrites* (which compromise IFN-y signalling 
necessary for IDO1 induction’) and IL-6 (ref. 8) (which promotes IDO1 
proteasomal degradation’®), lack of IDO1-dependent neutrophil apop- 
tosis'', loss of IDO1-driven non-canonical NF-«B activation (other- 
wise resulting in downregulation of proinflammatory cytokines, and 
upregulation of tolerogenic TGF-f), and probably defective IDO1 sig- 
nalling activity”. 

Thus the problem is not whether cells from CGD patients and mice 
equally display defective tryptophan catabolism in vitro. Rather, when 
contextualized to the current knowledge, that is a matter of appreciation 
that the IDO1 mechanism, and the multiple downstream regulatory re- 
sponses over which it presides—including control of the over-reactive 
responses to TLR signalling—are globally compromised. The situation— 
that is, elevated rather than suppressed circulating kynurenine and/or 
in vitro production in CGD patients*—may be similar to that of septic 
patients who display high levels of circulating kynurenine in the face of 
defective overall IDO1 functioning”. 

Substantial differences might occur between human and experimental 
CGD, and p47P'°*. deficient mice with infection-related acute inflamma- 
tory lung injury may well be an extreme condition. Such as they are, those 
mice provide a sound proof-of-principle in their being a prototypic 
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model of a specific condition, exemplifying how in experimental CGD 
the IDO1 mechanism of disease tolerance’* is severely compromised 
at sites where it would mostly be beneficial to control local inflamma- 
tory reactions. The very nature of granulation formation could be but 
one of the multiple phenotypic manifestations of defective IDO1 func- 
tioning in infection-related pathology’. This Reply has been written 
on behalf of the entire original author list’, most of the original authors 
are no longer working with us or on the specific subject. 
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CLIMATE CHANGE 


A crack in the natural-gas bridge 


Integrated assessment models show that, without new climate policies, abundant supplies of natural gas will have little 
impact on greenhouse-gas emissions and climate change. SEE LETTER P.482 


STEVEN J. DAVIS & CHRISTINE SHEARER 


urning fossil fuels such as coal, gas 
B and oil produces more than 80% of the 

world’s energy and more than 90% of 
global carbon dioxide emissions. Slowing and 
ultimately stopping climate change depends on 
decarbonization — the transformation of the 
global energy system into one that does not 
dump CO, into the atmosphere. Because gas- 
fired power plants emit roughly half as much 
CO, per unit of energy produced as coal-fired 
plants, the greatly expanded gas supplies prom- 
ised by new hydraulic fracturing (fracking) 
methods have been celebrated as a means of 
cutting emissions’. Progressive substitution 
of gas for coal and oil can thus decarbonize 
the energy sector’ and serve as a ‘bridge’ to a 
more distant future when carbon-free, renew- 
able-energy technologies are more affordable 
and reliable than they are now’. In this issue, 
McJeon et al.* (page 482) uncover a serious 
crack in the gas bridge: in the absence of new 
climate policies, increased supplies of natural 
gas may have little effect on CO, emissions and 
could actually delay decarbonization of the 
global energy system. 

McJeon and colleagues’ findings reveal 
two effects. First, abundant gas makes energy 
cheaper, thereby encouraging higher energy 
consumption and discouraging investment 
in energy efficiency. Second, natural gas com- 
petes for market share not only with coal, but 
also with very-low-carbon energy sources such 
as renewables and nuclear. 

Previous studies have questioned the climate 
benefits of natural gas relative to coal owing to 
the potential for the gas (mostly methane, a 
greenhouse gas) to leak into the atmosphere 
during its extraction, processing and trans- 
port®. More recently, researchers have begun 
to consider the effects of abundant natural gas 
on CO, emissions in the broader context of the 
energy marketplace®”. McJeon and co-workers’ 
paper is the first peer-reviewed study to do 
so on a global scale. It uses five independent 
energy-economic models to simulate the effects 
of gas supplies on the global energy system and 
on emissions of CO,, methane, nitrous oxide 
and aerosols such as sulphur dioxide and black 
carbon. Their study compares a ‘conventional’ 
gas supply with an ‘abundant case in which 
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Figure 1 | Relative growth of gas and renewable 
electricity. The ratio of natural gas to renewables 
used to generate electricity is sensitive to how 
much inexpensive gas is available. The red and 
blue lines show the median of this ratio across five 
energy-economic models used by McJeon et al.’ 
for scenarios of abundant and conventional gas 
supplies, respectively, whereas the shaded areas 
show the full range spanned by the individual 
models. For cases in which less gas is available 
(that is, in the conventional scenario), renewables 
as an electricity source begin to grow faster than 
gas 10 years into the 40-year modelling period. But 
when gas is abundant, its use grows faster than that 
of renewables throughout the period modelled and 
probably beyond it. 


natural-gas prices are halved, and evaluates 
the net influence of emissions on the climate 
system in the two scenarios. 

In all five models used by the authors, CO, 
emissions and their effect on climate (climate 
forcing) scarcely differed between the con- 
ventional and abundant scenarios. At most, 
abundant gas reduced cumulative CO, emis- 
sions between 2010 and 2050 by 2%, and 
reduced climate forcing over the same period 
by 0.3%. In several of the models, emissions 
and forcing actually increased under the 
abundant-gas scenario. But the exact numbers, 
although revealing, are less important than the 
overall insight: whether the goal is avoiding 
CO, emissions or hastening the transition to 
an emissions-free energy system, a global gas 
boom is not a replacement for energy and 
climate policies. 

Indeed, by replotting some of McJeon and 
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colleagues’ results, it is possible to observe the 
extent to which the availability of abundant 
gas delays the transition to low-carbon, renew- 
able energy sources such as solar and wind. 
Figure 1 shows the ratio of the amount of gas 
to renewables used to generate electricity in 
the authors’ models between 2010 and 2050. 
In the race between fossil fuels and low-carbon 
energy, the lines in the figure (which reflect 
the median of all five models) indicate which 
energy source is gaining ground. In the abun- 
dant-gas scenario, the ratio never decreases: 
gas-fired power pulls further and further ahead 
of renewable power throughout the 40-year 
period. But in the conventional-gas scenario, 
the ratio begins to decrease from 2020: renewa- 
bles start catching up. 

McJeon and co-workers’ study assumes 
that there will be no policies intended to 
reduce greenhouse-gas emissions or to sup- 
port low-carbon energy other than those 
already in place. Future work must carefully 
assess the effectiveness of various policies in 
reducing greenhouse-gas emissions and decar- 
bonizing the global energy system. Similarly, 
the authors’ results are probably sensitive to 
assumptions about the cost of low-carbon 
energy technologies over time, and system- 
atic analyses of such sensitivity could inform 
energy funding and policies. Finally, further 
studies may be needed to evaluate the extent 
to which natural gas could be used strategi- 
cally to complement and support variable 
renewable-energy technologies by provid- 
ing flexible back-up power that can ramp up 
quickly’®. Such applications could have very 
different implications for decarbonization and 
cumulative CO, emissions. Rather than simply 
building vast fleets of gas-fired power plants 
that lock in another generation of “committed 
emissions”"’, if we get the technologies and the 
policies right, natural gas might help us to cut 
emissions by working with renewable energy 
sources, rather than against them. 

The integrated analysis of McJeon et al. 
makes it clear that emissions per unit of 
energy is a poor measure of prospective energy 
sources. Differences in emissions between 
energy sources, considered in isolation, may be 
irrelevant given the complex feedbacks of the 
energy markets. Specifically, the authors’ study 
is the most robust evidence yet that expanding 


supplies of natural gas will not help us to avoid 
climate change and manage the transition to 
renewable energy sources in the absence of an 
effective climate policy. m 
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A stamp on the envelope 


A high-resolution crystal structure of the HIV-1 Env trimer proteins, in their 
form before they fuse with target cells, will aid the design of vaccines that elicit 
protective immune responses to this protein complex. SEE ARTICLE P.455 


ROGIER W. SANDERS & JOHN P. MOORE 


he surface of the HIV-1 virus is studded 

with envelope glycoprotein (Env) spikes. 

The virus uses these trimeric com- 
plexes, which each contain three gp120 and 
three gp41 subunits, to fuse with cells and 
initiate infection. Sixteen years ago, Kwong 
and colleagues described the crystal struc- 
ture of the core of the gp120 subunit’, but 
the lability of the complex in its pre-fusion 
form meant that the trimer structure was 
not determined until last year, when an engi- 
neered, stabilized and soluble version was 
used to produce highly concordant structures 
by X-ray crystallography’ and cryo-electron 


Target-cell 
membrane 


Viral membrane 


Pre-fusion 


Figure 1 | A model of HIV-1 fusion to target cells. HIV-1 Env proteins are 
trimers of three identical protomers, each with a gp120 and a gp41 subunit. The 
gp41 subunit comprises cytoplasmic (CT) and transmembrane (TM) domains, 
and an ectodomain that has six helix-forming segments (A-F), a fusion peptide 
(FP), a disulphide loop (DSL) and a membrane proximal external region 
(MPER). a, Pancera and colleagues’ trimer structure’ (a single gp140 protomer 
is shown) contains the gp120 subunit (green) and most of the ectodomain 

of the gp41 subunit (orange and purple), but omits other gp41 domains. The 
cysteine amino-acid residues (501C-605C) forming the engineered disulphide 
bond” in the trimer are indicated. b, The structure, together with previous 


microscopy’. Now, on page 455 of this issue, 
Kwong and colleagues (Pancera et al.)* report 
acrystal structure of the same Env trimer at 
higher resolution, providing a better picture, 
particularly of the gp41 subunits. Together, 
these structures” * help us to understand how 
the Env trimer functions, how antibodies 
recognize it (or not), and how vaccines exploit- 
ing this protein can be better designed. 
HIV-1 fusion occurs when the gp120 
components of Env trimers interact first with 
CD4 receptors on a cell’s surface and then 
with a co-receptor (CCR5 or CXCR4). The 
sequential receptor engagements drive the 
concerted disentanglement of the intimate, but 
fragile, embrace between gp120 and gp41. The 
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ectodomain of each gp41 subunit (the region 
that extends out from the viral membrane) 
contains six segments (A-F) that form two 
heptad-repeat regions (HR1 and HR2). These 
segments eventually become two long helices 
in the post-fusion structure, which is known as 
the six-helical bundle. Pancera and colleagues’ 
pre-fusion structure shows that HR1 and HR2 
are each split up into two smaller helices con- 
nected by loops; together, the four helices 
form a ring encircling the amino and carboxy 
termini of gp120 (Fig. 1). In turn, these gp120 
regions act as a ‘safety pin’ to prevent gp41 
from transiting to the energetically more 
favoured six-helical-bundle form. 

The authors use their structure to make 
inferences about the conformational changes 
in Envy proteins that take place during fusion, 
adding detail to the existing model of the pro- 
cess (Fig. 1b). When the cellular receptors 
are engaged, the safety pin is removed in a 
two-stage process. First, the top of the trimer 
opens up. The diminished constraints on the 
N-terminal segments of gp41 and the space 
vacated at the trimer axis allow segment B to 
undergo a loop-to-helix transition. The for- 
mation of the resulting long helix (HR1), now 


Post-fusion 


data, helps to build a model of viral fusion to target cells. In the pre-fusion 
protomer, helix segments A and C, and D and £ are interspersed by loop 
segments B and E, respectively. On binding to cell-surface receptors, a long 
helix comprising segments A, B and C forms, punching FP into the host-cell 
membrane. (The approximate location of the 1559P amino-acid substitution, 
which blocks the loop-to-helix transition in segment B of engineered trimers" 
and thereby stabilizes the pre-fusion structure, is indicated.) A second long 
helix of segments D, E and F then forms and aligns with the other helix ina 
hairpin structure. The formation of the trimer of hairpins (called the six-helical 
bundle) pulls the viral and target-cell membranes together. 
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containing segments A, B and C, punches the 
hydrophobic fusion peptide at the N terminus 
of gp41 into the target-cell membrane. The 
concomitant removal of one component of 
the four-helix ring weakens the links between 
gp120 and gp41. Co-receptor binding then 
fully removes the safety pin. The gp120 subunit 
probably now completely dissociates, elimi- 
nating any remaining steric constraints on the 
melding of the viral and cell membranes. In 
gp41, segments D-F extend to form the sec- 
ond long helix (HR2), and the formation of the 
six-helical bundle provides sufficient energy to 
fuse the membranes. Once enough individual 
trimers (probably around five’) have under- 
gone these transitions, the resulting fusion 
pore in the cell membrane allows the viral core 
to enter the cell. 

The human immune system can prevent 
HIV-1 from infecting cells by generating 
neutralizing antibodies that bind to vari- 
ous regions (epitopes) on the pre-fusion Env 
trimer. But the virus has evolved defences to 
evade both the generation and the binding of 
neutralizing antibodies. The most relevant are 
a dense array of protective glycans, which 
are less immunogenic than protein surfaces, 
and the ability to vary the trimer’s amino-acid 
composition. Pancera and colleagues’ structure 
displays just how efficient these obstacles are: 
one stunning image (see Fig. 6 in the Extended 
Data) shows the comprehensiveness of the 
shielding effect of its glycans compared with 
analogous proteins from influenza virus and 
respiratory syncytial virus. 

Even so, humans can produce antibodies 
that have extremely broad reactivity against 
circulating HIV-1 variants. Most known 
epitopes targeted by such broadly neutralizing 
antibodies (many of these epitopes include the 
otherwise protective glycans) are present on 
the engineered BG505 SOSIP.664 trimers used 
to generate the existing and new Env struc- 
tures” *°, Because inducing such antibodies is 
a major goal of HIV-1 vaccine development, 
does this mean that the problem is solved? 
Unfortunately, no — or, at least, not yet. 

There are two main elements in the develop- 
ment of vaccines based on Env: making immu- 
nogenic proteins with epitopes that could elicit 
broadly neutralizing antibodies, and forc- 
ing the immune system to respond to them. 
Current trimer-based immunogens, termed 
gp140s, are rendered soluble for vaccine 
development by deleting the transmembrane 
region of gp41 (ref. 7). However, doing so 
creates a loose end at the C terminus, which 
may destabilize inter-subunit interactions” *. 
For most gp140s, the cleavage site between 
gp120 and the gp41 ectodomain is delib- 
erately eliminated to ‘stabilize’ the trimer’. 
However, that stability is illusory, because these 
‘uncleaved’ gp140s predominantly adopt con- 
figurations in which semi-dissociated gp120 
subunits dangle loosely from the gp41 six- 
helical bundle*”°. The outcome is reminiscent 


of the conformational changes outlined above: 
the dissociation of gp120 from the gp41 
ectodomain and the latter’s transition to the 
six-helical bundle post-fusion conformation. 

These stability problems, and the resulting 
loss of epitopes that elicit broadly neutrali- 
zing antibodies, are overcome in the BG505 
SOSIP.664 trimers through the introduc- 
tion of an amino-acid substitution (1559P)"! 
that prevents the loop-to-helix transition in 
segment B and of an engineered disulphide 
bond between cysteine residues 501 and 
605 (ref. 12) that pins the gp120 subunits to 
the four-helix ring of gp41 (Fig. 1). Cleavage 
of the trimers is also promoted, which 
seems to strengthen the association between 
gp120 and gp4l (ref. 9). Together, these 
engineered changes maintain the soluble 
trimers in the pre-fusion conformation and 
preserve key epitopes for eliciting neutralizing 
antibodies” *’. 

The BG505 SOSIP.664 trimers, which are 
now vaccine candidates for human trials, 
are encouragingly immunogenic in animals, 
provoking a strong and consistent neutraliz- 
ing-antibody response to the BG505 virus, a 
neutralization-resistant strain of HIV-1. But 
they do not elicit broadly neutralizing antibod- 
ies. Future strategies to increase the breadth 
of the antibody response may involve reverse- 
engineering the immunogens on the basis 
of antibody evolution” and devising differ- 
ent ways of presenting them to the immune 
system, for example as particulate antigens”. 
Nowadays, structure-guided immunogen 


design is the best route to an effective vaccine”, 
and the new structural data will undoubtedly 
facilitate such improvements. By again placing 
their stamp on the envelope, the Kwong group 
has posted a frank warning to the virus. m 


Rogier W. Sanders and John P. Moore 

are in the Department of Microbiology and 
Immunology, Weill Medical College, 

Cornell University, New York, New York 
10065, USA. RWS. is also in the Department 
of Medical Microbiology, Academic Medical 
Centre, University of Amsterdam, 1105 AZ 
Amsterdam, the Netherlands. 

e-mails: r.w.sanders@amc.uva.nl; 
jpm2003@med.cornell.edu 


. Kwong, P. D. et al. Nature 393, 648-659 (1998). 

. Julien, J.-P. et al. Science 342, 1477-1483 (2013). 

. Lyumkis, D. et al. Science 342,1484-1490 (2013). 

. Pancera, M. et a/. Nature 514, 455-461 (2014). 

. Klasse, P. J. Virology 369, 245-262 (2007). 

Sanders, R. W. et al. PLoS Pathog. 9, e1003618 

(2013). 

7. Forsell, M. N., Schief, W. R. & Wyatt, R. T. Curr. Opin. 
HIV AIDS 4, 380-387 (2009). 

8. Guttman, M. & Lee, K. K. J. Virol. 87, 11462-11475 
(2013). 

9. Ringe, R. P. et al. Proc. Nat! Acad. Sci. USA 110, 
18256-18261 (2013). 

10.Tran, K. et al. Proc. Nat! Acad. Sci. USA 111, 
E738-E747 (2014). 

11.Sanders, R. W. et al. J. Virol. 76, 8875-8889 (2002). 

12.Binley, J. M. et al. J. Virol. 74, 627-643 (2000). 

13.Medina-Ramirez, M., Sanders, R. W. & Klasse, P. J. 
Expert Rev. Vaccines 13, 449-452 (2014). 

14.Schiller, J. & Chackerian, B. PLoS Pathog. 10, 
e€1004254 (2014). 

15.McLellan, J. S. et al. Science 342, 592-598 (2013). 


OoRWNe 


This article was published online on 8 October 2014. 


Treatment by 
cell transplant 


Transplanting gene-corrected macrophage cells directly into the lungs of mice 
has been shown to effectively treat their pulmonary alveolar proteinosis, a 
hereditary lung disease also found in humans. SEE ARTICLE P.450 


MARY JANE THOMASSEN & MANI S. KAVURU 


a rare lung disease characterized by the 

accumulation in the lung of white blood 
cells called alveolar macrophages that are full 
of surfactant — a compound of phospholipids 
and proteins that regulates surface tension in 
the lung — and of vast amounts of extracellular 
surfactant’. Unravelling the cause of this dis- 
ease, which was first recognized in 1958, isa 
story that began in 1994 with the serendipi- 
tous discovery” that mice lacking the protein 
GM-CSF, which is important for macrophage 
maturation and function, had a mysterious 
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lung disease that resembled human PAP. In this 
issue, Suzuki et al.* (page 450) add a chapter to 
this story, reporting that transplanting macro- 
phages that correctly respond to GM-CSF into 
the lungs of mice lacking the GM-CSF receptor 
effectively treats their disease. 

Studies of GM-CSF-deficient mice identi- 
fied the disease-causing defect as part of the 
process through which surfactant is broken 
down by macrophages in the alveolar region of 
the lung? (Fig. 1). And human studies revealed 
that, although alveolar macrophages from 
some people with PAP respond to GM-CSF 
stimulation in vitro®, the patients express 
antibodies that neutralize the protein’. PAP is 


Air eo GM-CSF receptor 


Mutant 


macrophage Surfactant 


GM-CSF 


Transplantation 


Lung tissue Alveolar cell 


Figure 1 | Macrophage transplant corrects hereditary PAP in mice. Macrophages are white blood 

cells that, among other functions, engulf and destroy cell debris. In the alveolar regions of the lungs, 
macrophages break down excess surfactant — a compound of phospholipids and proteins that is 
produced by alveolar cells to reduce surface tension, preventing lung collapse. But mutations that 

lead to macrophages lacking the receptor for the protein GM-CSF, which is essential for macrophage 
maturation and function, can cause the build-up of lung surfactant that is characteristic of the disease 
pulmonary alveolar proteinosis (PAP). Suzuki et al.* show that transplantation of macrophages expressing 
the GM-CSF receptor — either macrophages from wild-type mice or mutant macrophages that have 

been ‘gene-corrected’ ex vivo — directly into the lungs of mice treats the disease by allowing adequate 


surfactant breakdown. 


now classified into three forms: autoimmune 
(acquired), congenital (hereditary) and sec- 
ondary (linked mainly to cancers of the blood 
or systemic inflammatory diseases). All forms 
of PAP are associated with loss of GM-CSF sig- 
nalling owing either to deficiencies in active 
GM-CSF or to GM-CSF-receptor mutations, 
with the exception of some congenital forms 
that are associated with surfactant-protein 
abnormalities. 

Further work in mouse models and 
human PAP samples elucidated the steps of 
surfactant breakdown and the roles of key pro- 
teins in alveolar-macrophage biology. These 
included the transcription factors PU.1, which 
is involved in the maturation of alveolar mac- 
rophages®, and PPARy, which maintains lung 
homeostasis’. Contrary to macrophages in 
other parts of the body, alveolar macrophages 
express high levels of PPARy’, suggesting 
that this protein has a special role in the lung. 
Indeed, it has been shown that PPARy is a 
negative regulator of macrophage activation”® 
and that its expression is stimulated by GM- 
CSE". Alveolar surfactant is 90% lipid, and its 
catabolism is now known to be regulated by a 
signalling pathway involving GM-CSK PPARy 
and the protein ABCGI (refs 12,13). 

Therapeutic options for PAP were also 
first identified in mouse models of the dis- 
ease. Initial studies showed that GM-CSF- 
deficient mice could be ‘cured’ either by the 
administration of exogenous GM-CSF“ or by 
overexpression of GM-CSF in epithelial cells 
of the respiratory tract'’. These findings led 
to a trial of treating people who have auto- 
immune PAP with high doses of GM-CSF, 
administered under the skin or by inhalation, 
although a subset of patients failed to respond 
to this treatment, possibly because of high lev- 
els of anti-GM-CSF antibodies in their lungs. 
An alternative approach to treating the auto- 
immune form of PAP is to use the monoclonal 
antibody rituximab, which blocks production 


of anti-GM-CSF antibodies’* and induces 
increased expression of PPARy and ABCG1 
(ref. 17) in some patients. 

However, the only treatment for patients 
with hereditary PAP is whole-lung lavage 
(irrigation), which requires general anaes- 
thesia. It has been proposed that hereditary 
PAP might be corrected by transplantation 
of healthy bone-marrow cells, which contain 
stem cells that can differentiate into normal, 
GM-CSF-sensitive macrophages, and this pro- 
cedure has indeed been successful in mice’’. 
But this approach requires prior myeloablation 
— the severe or complete depletion of existing 
bone-marrow cells to avoid rejection of the 
transplanted cells. Myeloablation is associated 
with a high risk of infection and death and so 
bone-marrow transplantation is not routinely 
performed in patients with PAP. 

Suzuki et al. sought to design a transplan- 
tation approach that circumvents the need 
for myeloablation. They transplanted mac- 
rophages taken from normal mice directly into 
the lungs of mice deficient for the B-subunit of 
the GM-CSF receptor (which develop a disor- 
der identical to that in children with hereditary 
PAP owing to mutations in the receptor’s a- or 
B-subunits), and found that this treatment 
relieved the disease symptoms, normalized 
the expression of disease-related proteins 
and extended the lifespan of these mice. The 
authors then repeated the experiment using 
macrophages taken from GM-CSF-receptor- 
deficient mice that had been corrected ex vivo 
(by a process of lentiviral transduction) such 
that they expressed the B-subunit again, and 
saw the same effect. 

The feasibility of translating pulmonary- 
macrophage transplantation into a human 
therapy is strongly supported by these and 
other recent studies in mice. A previous study 
had also demonstrated that transplantation of 
wild-type murine macrophage-progenitor 
cells into GM-CSF-receptor deficient animals 
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effectively reduced hereditary PAP disease. 
Suzuki and colleagues have improved on this 
approach by using gene-corrected cells from 
genetically identical animals, thereby avoid- 
ing the need for myeloablation and immuno- 
suppression. This bears greater resemblance 
to the method that would most probably be 
adopted in humans, which would be to take a 
patient’s own, mutated, macrophages, correct 
them ex vivo, and return them to the patient. 
Furthermore, the authors show disease remis- 
sion with lessening of symptoms in mice. 

Suzuki et al. also found that, although wild- 
type bone-marrow-derived macrophages 
cultured in vitro had different characteristics 
to alveolar macrophages, they adopted a lung- 
macrophage profile following transplantation 
into the lung. This result supported earlier 
work” indicating that local microenviron- 
ments provide signals that direct macrophage 
development. Future studies will be needed to 
determine the optimal dose of transplanted 
cells for people, the effect of intrinsically ele- 
vated GM-CSF levels associated with heredi- 
tary PAP and whether additional GM-CSF will 
need to be administered to promote survival of 
the transplanted macrophages. 

The therapeutic implications of this 
approach reach beyond the rare disease of 
hereditary PAP. One can imagine transplan- 
tation of autologous gene-corrected mac- 
rophages for the treatment of other diseases, 
such as HIV. Macrophages serve as a reservoir 
for the virus and individuals lacking a certain 
macrophage co-receptor are HIV resistant, so 
transplanting macrophages without the recep- 
tor may convey immunity to the virus. Because 
it seems that local environments provide cues 
for the development of macrophages with 
certain characteristics, the possibilities are 
almost endless once the gene mutations that 
are relevant to certain disease states are iden- 
tified. The use of whole-genome sequencing 
to identify aberrant genes in infants born with 
life-threatening conditions may further extend 
the options for macrophage-transplantation 
therapies”. m 
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Hurling comets around 
a planetary nursery 


An analysis of hundreds of star- grazing comets in a young planetary system shows 
that they form two families: a group of old, dried-out comets and a younger group 
probably related to the break-up of a larger planetary body. SEE LETTER P.462 


AKI ROBERGE 


he Solar System today seems a quiet, 
orderly place. The planets are on nearly 
circular orbits, and large collisions with 
asteroids and comets are rare. But this placid 
middle age belies the Solar System's turbu- 
lent toddler stage nearly 4.6 billion years ago, 
when asteroids and comets were much more 
numerous and massive impacts contributed to 


the build-up of the terrestrial planets — Mer- 
cury, Venus, Earth and Mars. On page 462 of 
this issue, Kiefer et al.! open a window on this 
chaotic late stage of planetary-system forma- 
tion with 8 years of data on star-grazing comets 
in the 23-million-year-old planetary system 
around the star B Pictoris (f Pic). The authors 
show that the comets in this system are not all 
alike, but instead consist of two families with 
distinct dynamical and evaporative properties. 


Figure 1 | A family of fragments from a broken-up comet. Using 8 years’ worth of data on the 
23-million-year-old planetary system around the star B Pictoris, Kiefer et al.' discovered nearly 

500 individual comets passing between the star and our vantage point on Earth. They found a family 
of comets in this sample that approached the star from a single direction and that probably came from 
the break-up ofa larger icy planetary body. This image of the Solar System comet 73P/Schwassmann- 
Wachmann 3 indicates that it has been broken up into a ‘string of pearls’ — a similar phenomenon to 
that discovered by Kiefer and colleagues but on a much smaller scale. 
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The B Pic planetary system is special. 
Among its notable features are its young age’; 
its disk of gas and dust originating from the 
destruction of comets and asteroids’; its giant 
planet, B Pictoris b, one of the first directly 
imaged exoplanets*; and its huge clumps of 
orbiting carbon monoxide, which indicate the 
presence of another giant planet far from the 
star®. Objects called falling evaporating bodies 
(FEBs) were also found early on®. Like a Solar 
System comet (Fig. 1), an FEB heats up as it 
approaches the star and starts to evaporate, 
releasing gas and dust. When one of these 
bodies passes through Earth’s line of sight to 
the star (a transit), the gas absorbs some por- 
tion of the starlight, producing time-variable 
absorption features in the stellar spectra. Each 
of these features indicates the transit of a star- 
grazing comet. This phenomenon occurs in 
the present-day Solar System, but it happens 
much more frequently around 6 Pic. 

Kiefer et al. proved this with their large 
collection of optical spectra of B Pic obtained 
using the HARPS instrument on the European 
Southern Observatory’s 3.6-metre telescope in 
La Silla, Chile. This instrument is more typically 
used to discover exoplanets using the radial- 
velocity technique, which measures the change 
in the velocity of a star caused by the gravita- 
tional pull of orbiting planets. But it is also well 
suited to detecting star-grazing comets through 
changes in the absorption lines of ionized cal- 
cium gas in the stellar spectra. In their 8 years 
of HARPS data on f Pic, Kiefer and colleagues 
confirmed 493 individual comets. 

This large set of comets allowed the discov- 
ery of distinct families within the population. 
Measurements of the comet radial velocities 
relative to the star revealed two groups, one 
with a broad distribution of velocities (popula- 
tion S) and the other with a distribution that 
narrowly peaked at around 15 kilometres per 
second (population D). This means that the 
D comets are all approaching the star from 
a particular direction, whereas the S comets 
approach from a wide range of directions. The 
two populations had other distinct dynamical 
properties indicating that the D comets transit 
the star at greater distances than the S comets. 

Previous studies of the B Pic FEBs had 
already hinted at this dynamical informa- 
tion’, but Kiefer and colleagues also measured 
the evaporative properties of the star-grazing 
comets. Because comets evaporate faster the 
hotter and closer they are to a star, one would 
expect the S comets to produce more gas than 
the D comets, which are farther from the star 
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when they transit. But the opposite is true. The 
authors used their data to calculate the evapo- 
ration efficiency for each comet, and showed 
that the D comets generate more gas than the 
S comets. The former are either physically 
larger than the latter or they are ‘fresher, with 
more exposed ices at their surfaces that can 
evaporate. The D comets’ characteristics are 
consistent with the idea that the comets are the 
fragments of a larger icy planetary body that 
recently broke up. Further work will be needed 
to determine exactly how recently. 

There is ambiguity about whether the 
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D comets are fresher or simply larger than the 
S comets. But either way, they are distinctly dif- 
ferent in both their dynamics and their physi- 
cal properties and hold clues to the extreme 
collisions occurring in the 6 Pic planetary 
system. Kiefer and colleagues’ work adds to 
the transformation in our understanding of 
planet formation that has developed over the 
past three decades, from a slow, stately process 
to a dynamic and sometimes violent one. = 


Aki Roberge is at the NASA Goddard 
Space Flight Center, Exoplanets and Stellar 


Relax and come in 


During inflammation, lymph nodes swell with an influx of immune cells. New 
findings identify a signalling pathway that induces relaxation in the contractile 
cells that give structure to these organs. SEE LETTER P.498 


KARI VAAHTOMERI & MICHAEL SIXT 


echanical forces are key elements in 
M the developmental control of organ 

size’. Adjusting the size of lymph 
nodes is a special challenge, because immune 
cells continually enter and exit the organs from 
the bloodstream, and the number of cells the 
nodes contain can rapidly increase during an 
immune or inflammatory response. Despite 
these dramatic changes in cellularity, the struc- 
tural backbone of lymph nodes — a network of 
stromal fibroblastic reticular cells — remains 
relatively stable’. In this issue, Acton et al.* 
(page 498) show that, during inflammation, 
immune cells called dendritic cells transmit a 
signal that triggers the physical relaxation of 
these stromal cells, thereby making space for 
more immune cells to enter the lymph nodes. 
This mechanical control aids rapid swelling of 
the organ, without substantial remodelling or 
proliferation of the stromal cells. 

Lymph nodes are the sites of key immune- 
cell interactions. In these organs, dendritic cells 
(DCs) present pathogen molecules (antigens) 
carried in from peripheral tissues to lym- 
phocytes (particularly T cells), which travel 
between lymph nodes through the blood and 
lymphatic circulation. In the lymph nodes, 
T cells rapidly fan out to scan the DCs for anti- 
gens that match their specific T-cell receptors. 
Although it is known that the same guidance 
cues attract DCs and T cells to a shared lymph- 
node compartment, it is unclear how the entry 
and exit of T cells is coordinated quantita- 
tively. It has been suggested’ that, on exiting 
the bloodstream, T cells remain in a waiting 
position until there is enough space to enter 
the lymph node. Such mechanical regulation 
might balance influx and efflux, but this must 


shift during inflammatory states to allow for 
swelling of the organ and intensified scanning 
of DCs by T cells. 
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(FRCs) permeates the lymph node like the 
skeleton of a sponge. FRCs are the main pro- 
ducers of chemokines, the factors that attract 
DCs and T cells and keep them motile. The 
migrating cells also use the adhesive FRC sur- 
face as a guidance structure. In addition to 
these roles in orchestrating immune-cell traf- 
fic, the FRC network forms an interconnected 
system of micro-scale conduits, which are 
connected to the lymphatic system and blood- 
stream and serve asa system for distributing 
small molecules to resident DCs. Although 
FRCs have many characteristics of epithelial 
cells (which line the body’s surfaces and cavi- 
ties), they also express smooth-muscle actin®, 


b Inflammation 
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Figure 1 | Lymph-node swelling during the inflammatory response. Immune cells, including 


lymphocytes and dendritic cells (DCs), continually enter and exit lymph nodes. The number of cells that 
the node can accommodate is restricted by the available space, which Acton et al.’ show is regulated by 
the contractile state of the fibroblastic reticular cells (FRCs) that form a network providing lymph-node 
structure. a, In the steady state, the FRCs are kept in a contracted state by activity of the transmembrane 
protein podoplanin, which signals through ERM proteins to RhoA and GEF-H1 (not shown), inducing 
myosin-II-mediated contraction of the actomyosin protein network. b, During inflammation, activation 
of DCs induces expression of the surface molecule CLEC-2, which binds podoplanin and switches 

off this signalling pathway. This leads to relaxation of the FRC network, allowing the lymph node to 
accommodate more lymphocytes and to swell. 
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a protein normally found in cell types that can 
physically contract. 

Acton et al. show that FRCs do have 
contractile function and that they use this 
to tune the tension of the lymph node, dem- 
onstrating that it is not the capsule tissue 
surrounding the organ that restricts its size, but 
its internal scaffold. The authors investigated 
signalling in these cells through podoplanin, 
an FRC transmembrane protein that binds the 
receptor CLEC-2, which is induced on DCs 
following contact with pathogens. They found 
that, in the absence of CLEC-2 as a ligand, podo- 
planin transmits signals through proteins of the 
ERM family to the proteins GEF-H1 and RhoA, 
which regulate actin contractility mediated by 
the protein myosin II. This signalling maintains 
FRCs ina highly contractile state. Binding of 
podoplanin by CLEC-2 caused podoplanin to 
redistribute to another membrane compart- 
ment, where it stopped signalling to ERM 
proteins and thereby relaxed actomyosincon- 
traction. This switched the cells from a contrac- 
tile state dominated by actomyosin filaments to 
a protrusive, relaxed and elongated form. 

These in vitro findings suggested a plausible 
scenario for events in vivo (Fig. 1). Activated 
DCs in the lymph nodes — either DCs that 
are activated in the periphery and migrate into 
the lymph node, or DCs resident in the lymph 
node that are activated by factors carried by 
the FRC conduit system — are induced to 
upregulate CLEC-2, causing relaxation of the 
FRCs. This loosens up the lymph-node struc- 
ture, allowing influx of additional T cells and 
effectively expanding the volume of the organ. 

In line with this model and previous stud- 
ies’, Acton and colleagues found that, at early 
time points after an inflammatory stimulus in 
mice, when the cellularity and volume of their 
lymph nodes has already tripled, FRCs did not 
undergo significant proliferation. Instead, the 
spacing of the FRC network increased, indicat- 
ing a stretched configuration of the cells. To 
test the potential involvement of podoplanin 
signalling in this process, the authors studied 
mice that were lacking CLEC-2 in DCs, and 
found that they show severely impaired lymph- 
node swelling after immunization. However, 
swelling was restored when podoplanin on 
FRCs was artificially bound by injecting the 
mice with CLEC-2 protein. 

This newly identified mechanism for 
lymph-node relaxation is thought-provoking. 
Besides lymph nodes, podoplanin-expressing 
FRC-like cellular networks are found in the 
thymus, spleen and tumour-associated stroma, 
and may play a similar part in regulating the 
size and cellularity of these organs and tissues. 
If the size of the local immune compartment 
could be pharmacologically altered by tuning 
the contractile state, such networks might even 
serve as potential targets for immunomodula- 
tion. Podoplanin is also induced on most tissue 
fibroblast cells during inflammatory states®. In 
this situation, CLEC-2-podoplanin signalling 


might biomechanically resolve swelling and 
inflammation by squeezing the tissue once 
CLEC-2 levels drop with the disappearance of 
pathogenic stimuli. 

Although it is highly plausible that DCs 
act as key messengers to mediate lymph- 
node relaxation, it seems probable that FRC 
contractility can also be tuned by other input 
signals. In this context, it is interesting to note 
that in many mammals (horses, dogs, humans 
and especially deep-diving seals), the spleen 
can contract under low-oxygen conditions to 
expel an ‘emergency reservoir’ of oxygenated 
red blood cells’. Furthermore, elevated num- 
bers of white blood cells have been associated 
with spleen and lymph-node contractions, 
indicating that these responses, which are 


mainly triggered by nervous-system signals®, 
could also serve immunological functions. m 
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Potency needs 


constancy 


In a finding that highlights ways to optimize the efficacy of antibody-based 
therapeutics and vaccines, the activity of potent HIV-1-neutralizing antibodies 
has been confirmed to depend on cellular binding to the antibodies’ Fc regions. 


ALEXANDRA TRKOLA 


rotection conferred against virus 
Pisccics by licensed antiviral vac- 

cines is mostly due to the generation of 
‘neutralizing antibodies that bind to virus 
particles in such a way that they block their 
entry into a cell’. Current strategies for design- 
ing treatments and vaccines for HIV-1 focus 
on such antibodies, particularly on the highly 
potent, broadly neutralizing antibodies that 
are produced during the immune responses 
of rare infected individuals” and which bind 
to the envelope protein (a highly variable pro- 
tein) of multiple strains of the virus. However, 
it is becoming clear that the impact of these 
antibodies is not limited to direct viral neu- 
tralization through binding. Writing in Cell, 
Bournazos et al.* show that in vivo activity 
of even the most potent broadly neutralizing 
anti-HIV-1 antibodies relies on antiviral func- 
tions that are triggered by binding of cellular 
receptors — the Fcy receptors — to the ‘tail’ 
region of antibodies of the immunoglobulin G 
class, binding that was previously found’ to be 
crucial for sustaining the in vivo activity of 
less-potent neutralizing antibodies. 

Despite showing promising neutralizing 
activity in vitro, attempts to use broadly neu- 
tralizing antibodies to control HIV-1 infec- 
tions in vivo have had limited success. Up 
to now, only the most potent neutralizing 
antibodies, used at high dose and often only 
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in combination, have yielded measurable 
control of HIV-1, and this effect rapidly 
declines with waning antibody levels and as 
a result of mutations that allow the virus to 
‘escape’ antibody binding®”. Difficulties with 
tissue penetration’’ and reduced neutralizing 
activity during cell-to-cell viral transmission” 
may also contribute to the lower efficacy of 
these antibodies in vivo and necessitate the 
maintenance of antibody levels orders of mag- 
nitude above those sufficient in vitro”. 
Antibodies are notable for their capacity 
to link the adaptive and innate arms of the 
immune response: the variable domains of 
an antibody specifically recognize different 
targets, and their constant tail domain, the 
Fc region, stimulates an array of immune 
defences. The Fc region is bound by proteins 
of the complement system, which activate 
signalling pathways to induce inflammatory 
responses and destruction of antibody-bound 
cells or viruses. The Fc region of immuno- 
globulin G is also recognized by activating and 
inhibitory Fcy receptors (FcyRs) on effector 
cells of the immune system, such as monocytes, 
macrophages, dendritic cells, neutrophils and 
natural killer cells. Fc binding provides these 
cells with essential stimulatory and regulatory 
signals’”. FcyR-mediated effector-cell func- 
tions include antibody-dependent cellular 
cytotoxicity (in which antibodies bind to viral 
proteins expressed on the infected cell, trig- 
gering its killing by effector cells), phagocytic 
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Figure 1 | Antibody-mediated viral control. a, Binding of certain (blue) but not other (red) antibodies’ 
variable regions to an HIV-1 virus particle can directly neutralize the virus if all envelope proteins required 
for cell entry are inhibited. b, However, Bournazos et al.’ show that the antiviral activity of these neutralizing 
antibodies in vivo depends on effects elicited when the constant (Fc) regions of immunoglobulin G antibody 
molecules are bound by Fcy receptors (FcyRs) on the surface of effector cells of the immune system. These 
effects include engulfment and destruction (phagocytosis) of antibody-bound virus particles (including 
neutralized and non-neutralized particles); the secretion of soluble factors that stimulate other immune 
activities; and direct killing (antibody-dependent cellular cytotoxicity; ADCC) of virus-infected cells. 


clearance (cellular engulfment and destruction 
of viruses), and the release of soluble antimi- 
crobial and immunomodulatory factors, such 
as cytokines and chemokines (Fig. 1). 
Recruitment of effector functions has long 
been a focus in the harnessing of antibody- 
based protection against HIV-1*>?"4, but 
the precise contributions of direct antibody- 
mediated virus neutralization, activity of 
the complement system and FcyR-mediated 
effector functions remain unresolved”. Pio- 
neering work indicated that neutralizing anti- 
bodies require FcyR functions for their in vivo 
activity’, whereas non-neutralizing antibod- 
ies that act solely through effector functions 
have shown limited in vivo activity against 
HIv-1'*, suggesting that a combination of 
both neutralization and effector-mediated 
activity is needed. By contrast, complement 
action seems dispensable, at least when neu- 
tralizing antibodies are used as a pre-exposure 
prophylaxis (to try to prevent infection)”. Now, 
Bournazos et al. expand on the previous find- 
ing that FcyR-dependent mechanisms are 
required” by investigating the effect of anti- 
body treatment on HIV-1 infection in mice 
in the presence or absence of Fc—FcyR inter- 
actions, either by modulating the antibodies’ 
Fc region or by using mice lacking FcyRs. 
Particularly notable is the authors’ find- 
ing that, despite their superior potency, even 
the action of broadly neutralizing antibod- 
ies is largely dependent on interactions with 
activating FcyRs. When the researchers engi- 
neered the antibodies to improve the strength 
of Fc-FcyR binding, they observed increased 
viral control in vivo, highlighting the poten- 
tial of antibody improvement for therapeutic 


use. On reflection, however, this dependence 
on FcyR functions might prove to be a crucial 
limitation on the use of engineered molecules 
designed to inhibit HIV entry to cells by tar- 
geting viral envelope proteins, because such 
molecules lack the ability to recruit immune 
effector functions. 

Intriguingly, in experiments using HIV 
particles that can complete only one round of 
infection, such that newly infected cells do not 
express HIV envelope proteins and hence lack 
binding sites for antibodies, Bournazos et al. 
still saw FcyR-dependent antibody-mediated 
viral control. This suggests that classical 
antibody-dependent cellular cytotoxicity can 
be ruled out as a sole driving force, and that 
effector-cell release of soluble antiviral factors 
and the phagocytic clearance of viral particles 
may be decisive. 

Nevertheless, a central question that remains 
is, why would a potent neutralizing antibody 
need to rely on fast, FeyR-mediated removal of 
antibody-bound viral particles? There are sev- 
eral possible explanations. Neutralization can 
be reversible if the antibody binding has a high 
off-rate'*'§, and ifan antibody fails to irrevers- 
ibly neutralize the virus, it may be crucial to 
eliminate these viruses rapidly before they 
regain infectivity. Tissue penetration may be 
another limiting factor’®: lower antibody doses 
at certain sites may mean that not all envelope 
proteins on the virus are immediately bound 
and neutralized. 

How many envelope proteins HIV carries, 
how many of these are needed to infect (and 
in turn must be neutralized), and how many 
antibody molecules are needed to trigger 
irreversible neutralization of envelope proteins 
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will all also contribute to the rate of virus 
inactivation”. Alongside these stoichiometric 
requirements, the kinetics of neutralization will 
be steered by the on-rate of antibody binding. 
Considering all these factors, enhanced clear- 
ance of antibody-bound viral particles may 
become relevant in situations in which the 
virus is not fully neutralized, either because 
the threshold of neutralizing-antibody binding 
required for inactivation has not been reached, 
or because the specific neutralizing antibody 
fails to irreversibly block the virus. In support 
of the idea that quantity of antibody, and hence 
antibody occupancy, plays a part, Bournazos 
and colleagues observed that eliciting FeyR 
effector functions had an additive effect on viral 
control at lower, but not higher, antibody doses. 
Further work will be needed to precisely 
quantify the impact of phagocytic clearance, 
and to define if only one or a combination of 
FcyR-mediated functions is needed to achieve 
in vivo control of HIV-1 by neutralizing anti- 
bodies. Several technical challenges may arise 
in attempts to investigate this. HIV-1 infection 
is commonly monitored by quantifying levels 
of viral RNA, but this approach does not assess 
the infectivity of the viral particles present, 
and so both neutralized and non-neutralized 
virions will be counted. Thus, short-term 
experiments in which the effects of neutral- 
izing antibodies are assessed solely on the basis 
of a reduction in RNA levels will not be able 
to quantify the contribution of Fc-mediated 
effects. Long-term monitoring of infections, as 
performed by Bournazos et al.’ and in previous 
work’, in combination with measurements of 
circulating infectious virus and infected cells, 
will be key to quantifying the influence of 
FcyR-dependent mechanisms. = 
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Genome sequence of a 45,000-year-old 
modern human from western Siberia 


Qiaomei Fu’, Heng Li**, Priya Moorjani*° , Flora J ay’, Sergey M. Slepchenko’, Aleksei A. Bondarev’, Philip L. F. Johnson’, 
Ayinuer Aximu- Petri’, Kay Priifer?, Cesare de Filippo’, Matthias Meyer?, Nicolas ) Zwyns'* 1 Domingo C. Salazar-Garcia 
Yaroslav V. Kuzmin!°, ‘Susan G. Keates!® , Pavel A. Kosintsev'® , Dmitry I. Razhev’ , Michael P. Richards!” » Nikolai V. Peristov'® 
Michael Lachmann”, Katerina Douka”’, Thomas F. G. Higham??, Montgomery Slatkin® , Jean-Jacques Hublin!® 


David Reich**!, Janet Kelso’, T. Bence Viola?!° & Svante Piaibo? 


We present the high-quality genome sequence of a~45,000-year-old modern human male from Siberia. This individual 
derives from a population that lived before—or simultaneously with—the separation of the populations in western and 
eastern Eurasia and carries a similar amount of Neanderthal ancestry as present-day Eurasians. However, the genomic 
segments of Neanderthal ancestry are substantially longer than those observed in present-day individuals, indicating 
that Neanderthal gene flow into the ancestors of this individual occurred 7,000-13,000 years before he lived. We 


10,12,13, Ae 


estimate an autosomal mutation rate of 0.4 x 10~? 
0.7x 10°? 


on the age of the bone. 


In 2008, a relatively complete left human femoral diaphysis was discov- 
ered on the banks of the river Irtysh (Fig. 1a, c, d), near the settlement 
of Ust’-Ishim in western Siberia (Omsk Oblast, Russian Federation). 
Although the exact locality is unclear, the femur was eroding out of al- 
luvial deposits on the left bank of the river, north of Ust’-Ishim. Here, 
Late Pleistocene and probably redeposited Middle Pleistocene fossils 
are found in sand and gravel layers that are about 50,000-30,000 years 
old (that is, from Marine Oxygen Isotope Stage 3). 


Morphology, dating and diet 

The proximal end of the bone shows a large gluteal buttress and gluteal 
tuberosity, while the midshaft is dominated by a marked linea aspera, 
resulting in a teardrop-shaped cross-section (Fig. le, f) (for details, see 
Supplementary Information section 3). The morphology of the prox- 
imal end of the shaft is similar to Upper Paleolithic modern humans and 
distinct from Neanderthals (Supplementary Table 3.1, Supplementary 
Fig. 3.2.), while the teardrop-shaped cross section of the midshaft is sim- 
ilar to most Upper Paleolithic humans and early anatomically modern 
humans’. Taken together, this suggests that the Ust’-Ishim femur derives 
from a modern human. 

Two samples of 890 mg and 450 mg of the bone were removed on sep- 
arate occasions for dating. Collagen preservation satisfied all criteria for 
dating’ and after ultrafiltration we obtained ages of 41,400 + 1,300 years 
before present (BP) (OxA-25516) and 41,400 + 1,400 Bp (OxA-30190). 
These two dates, when combined and corrected for fluctuations of atmo- 
spheric a @ through time, correspond to an age of about 45,000 calibrated 


to 0.6 x 10~° per site per year, a Y chromosomal mutation rate of 
to 0.9 x 107° per site per year based on the additional substitutions that have occurred in present-day non- 
Africans compared to this genome, and a mitochondrial mutation rate of 1.8 x 107 


8 to 3.2 x 108 per site per year based 


years BP (46,880-43,210 cal Bp at 95.4% probability, Supplementary 
Information section 1). The Ust’-Ishim individual is therefore the oldest 
directly radiocarbon-dated modern human outside Africa and the Mid- 
dle East (Fig. 1b). Carbon and nitrogen isotope ratios indicate that the 
diet of the Ust’-Ishim individual (Supplementary Information section 4) 
was based on terrestrial C; plants and animals that consumed them, but 
also that an important part of his dietary protein may have come from 
aquatic foods, probably freshwater fish, something that has been ob- 
served in other early Upper Palaeolithic humans from Europe’. 


DNA retrieval and sequencing 


Nine samples of between 41 and 130 mg of bone material were removed 
from the distal part of the femur and used to construct DNA libraries 
using a protocol designed to facilitate the retrieval of short and damaged 
DNA*. The percentage of DNA fragments in these libraries that could 
be mapped to the human genome varied between 1.8% and 10.0% (Sup- 
plementary Table 1.1). From the extract containing the highest propor- 
tion of human DNA, eight further libraries were constructed. Each of 
these libraries was treated with uracil-DNA glycosylase and endonucle- 
ase VIII to remove deaminated cytosine residues, and library molecules 
with inserts shorter than approximately 35 base pairs (bp) were depleted 
by preparative acrylamide gel electrophoresis before sequencing on the 
Illumina HiSeq platform (Supplementary Information section 6). In total, 
42-fold sequence coverage of the ~ 1.86 gigabases (Gb) of the autosomal 
genome to which short fragments can be confidently mapped was gen- 
erated. The coverage of the X and Y chromosomes was approximately 
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Figure 1 | Geographic location, morphology and dating. a, Map of Siberia 
with major archaeological sites. Red triangles: Neanderthal fossils; white 
circle within a red (Neanderthal) triangle: Denisovan fossils; blue square: 
Initial Upper Palaeolithic sites; yellow asterisk: Ust’-Ishim. 1: Ust’-Ishim; 2: 
Chagyrskaya Cave; 3: Okladnikov Cave; 4: Denisova Cave; 5: Kara-Bom. 

b, Radiocarbon ages of early modern human fossils in northern Eurasia and the 
NGRIP 8/0 palaeotemperature record. Specimens in light grey are indirectly 


half that of the autosomes (~22-fold), indicating that the bone comes 
from a male. A likelihood method estimated present-day human mito- 
chondrial DNA (mtDNA) contamination’ to 0.50% (95% confidence 
interval (CI) 0.26-0.94%), whereas a method that uses the frequency of 
non-consensus bases in autosomal sequences estimated the contam- 
ination to be less than 0.13% (Supplementary Information section 7). 
Thus, less than 1% of the hominin DNA fragments sequenced are esti- 
mated to be extraneous to the bone. After consensus genotype calling, 
such low levels of contamination will tend to be eliminated. 


Population relationships 

About 7.7 positions per 10,000 are heterozygous in the Ust’-Ishim 
genome, whereas between 9.6 and 10.5 positions are heterozygous in 
present-day Africans and 5.5 and 7.7 in present-day non-A fricans (Sup- 
plementary Information section 12). Thus, with respect to genetic di- 
versity, the population to which the Ust’-Ishim individual belonged was 
more similar to present-day Eurasians than to present-day Africans, 
which probably reflects the out-of-Africa bottleneck shared by non- 
African populations. The Ust’-Ishim mtDNA sequence falls at the root 
ofa large group of related mtDNAs (the ‘R haplogroup’), which occurs 
today across Eurasia (Supplementary Information section 8). The Y 
chromosome sequence of the Ust’-Ishim individual is similarly inferred 
to be ancestral to a group of related Y chromosomes (haplogroup K(xLT)) 
that occurs across Eurasia today® (Supplementary Information section 9). 
As expected, the number of mutations inferred to have occurred on the 
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dated (OxCal v4.2.3(ref. 33); r:5 IntCal13 atmospheric curve™). H5: Heinrich 5 
event, H4: Heinrich 4 event, GI 12: Greenland Interstadial 12. For a more 
extensive comparison see Supplementary Information Fig. 2.1. c-f, The Ust’- 
Ishim 1 femur. c, Lateral view. d, Posterior view. e, Cross-section at the 80 
percent level. f, Cross-section at the midshaft. For other views see 
Supplementary Fig. 3.1. 


branch leading to the Ust’-Ishim mtDNA is lower than the numbers 
inferred to have occurred on the branches leading to related present- 
day mtDNAs (Supplementary Fig. 8.1). Using this observation and nine 
directly carbon-dated ancient modern human mtDNAs as calibration 
points”” in a relaxed molecular clock model, we estimate the age of the 
Ust’-Ishim bone to be ~49,000 years BP (95% highest posterior den- 
sity: 31,000-66,000 years BP), consistent with the radiocarbon date. 
In a principal component analysis of the Ust’-Ishim autosomal ge- 
nome along with genotyping data from 922 present-day individuals 
from 53 populations® (Fig. 2a), the Ust’-Ishim individual clusters with 
non-Africans rather than Africans. When only non-African popula- 
tions are analysed (Fig. 2b), the Ust’-Ishim individual falls close to zero 
on the two first principal component axes, suggesting that it does not share 
much more ancestry with any particular group of present-day humans. 
To determine how the Ust’-Ishim genome is related to the genomes of 
present-day humans, we tested, using D statistics, whether it shares more 
derived alleles with one modern human than with another modern human 
using pairs of human genomes from different parts of the world (Fig. 3). 
Based on genotyping data for 87 African and 108 non-African indivi- 
duals (Supplementary Information section 11), the Ust’-Ishim genome 
shares more alleles with non-Africans than with sub-Saharan Africans 
(|Z| = 41-89), consistent with the principal component analysis, mtDNA 
and Y chromosome results. Thus, the Ust’-Ishim individual represents 
a population derived from, or related to, the population involved in the 
dispersal of modern humans out of Africa. Among the non-Africans, 
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Figure 2 | Principal Components (PC) analysis exploring the relationship of 
Ust’-Ishim to present-day humans. a, PC analysis using 922 present-day 
individuals from 53 populations and the Ust’-Ishim individual. b, PC analysis 


the Ust’-Ishim genome shares more derived alleles with present-day 
people from East Asia than with present-day Europeans (|Z| = 2.1-6.4). 
However, when an ~8,000-year-old genome from western Europe (La 
Brafia)’ or a 24,000-year-old genome from Siberia (Mal’ta 1)’ were 
analysed, there is no evidence that the Ust’-Ishim genome shares more 
derived alleles with present-day East Asians than with these prehistoric 
individuals (|Z| < 2). This suggests that the population to which the Ust’- 
Ishim individual belonged diverged from the ancestors of present-day 


Eigenvector1 5.26 % of variance 


using Eurasian individuals and the Ust’-Ishim individual. The percentages of 
the total variance explained by each eigenvector are given. 


West Eurasian and East Eurasian populations before—or simultaneously 
with—their divergence from each other. The finding that the Ust’-Ishim 
individual is equally closely related to present-day Asians and to 8,000- 
to 24,000-year-old individuals from western Eurasia, but not to present- 
day Europeans, is compatible with the hypothesis that present-day 
Europeans derive some of their ancestry from a population that did 
not participate in the initial dispersals of modern humans into Europe 
and Asia’'. 


Y X Mutation rate estimates 
Mead Beene The high-quality Ust’-Ishim genome sequence, in combination with its 
Mut fe Man radiocarbon date, allows us to gauge the rate of mutations by estimating 
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D(X Y; Ust’-Ishim Chimpanzee) 


Figure 3 | Statistics testing whether the Ust’-Ishim genome shares more 
derived alleles with one or the other of two modern human genomes (X, Y). 
We computed D statistics of the form D (X, Y, Ust’-Ishim, Chimpanzee) using 
a subset of the genome-wide SNP array data from the Affymetrix Human 
Origins array and restricting the analysis to transversions. Error bars 
correspond to three standard errors. Red bars indicate that the D statistic is 
significantly different from 0 (|Z|>2), such that the Ust’-Ishim genome 
shares more derived alleles with the genome on the right (X) than the left (Y). 
Ancient genomes are given in italics. 


tion rate of 0.43 X 10°” per site per year (95% CI 0.38 X 10° to 0.49 X 
10°) that is consistent across all non-African genomes regardless of 
their coverage (Supplementary Information section 14). This overall 
rate, as wellas the relative rates inferred for different mutational classes 
(transversions, non-CpG transitions, and CpG transitions), is similar to 
the rate observed for de novo estimates from human pedigrees (~0.5 X 
10” per site per year'*"*) and to the direct estimate of branch shortening 
(Supplementary Information section 10). As discussed elsewhere’*’*””, 
these rates are slower than those estimated using calibrations based on 
the fossil record and thus suggest older dates for the splits of modern 
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Figure 4 | Inferred population size changes over time. ‘Time’ on the x axis 
refers to the pairwise per-site sequence divergence. If we erroneously assume 
that Ust’-Ishim lived today, its inferred population size history includes an 
out-of-Africa-like population bottleneck that is more recent than that seen in 
present-day non-Africans (red bold curve). By shifting the Ust’-Ishim curve 
to align with those in present-day non-Africans (blue bold curve), and 
assuming that the number of mutations necessary to do this corresponds to 
45,000 years, we estimate the autosomal mutation rate to be 0.38 X 10 ° to 
0.49 X 10° ° per site per year. The times indicated on the top of the figure 
are based on this mutation rate. 


human and archaic populations. We caution, however, that rates may 
have changed over time and may differ between human populations. 
However, we expect this mutation rate estimate to apply at least to 
non-African populations over the past 45,000 years. 

Wealso estimated a phylogeny relating the non-recombining part of 
the Ust’-Ishim Y chromosome to those of 23 present-day males. Using 
this phylogeny, we measured the number of ‘missing’ mutations in the 
Ust’-Ishim Y chromosomal lineage relative to the most closely related 
present-day Y chromosome analysed. This results in an estimate of the 
Y chromosome mutation rate of 0.76 X 10” per site per year (95% CI 
0.67 X 10 ° to 0.86 X 10”) (Supplementary Information section 9), 
significantly higher than the autosomal mutation rate, consistent with 
mutation rates in males being higher than in females'*”°. Finally, using 
the radiocarbon date of the Ust’-Ishim femur together with the mtDNAs 
of 311 present-day humans, we estimated the mutation rate of the com- 
plete mtDNA to be 2.53 X 10 ®* substitutions per site per year (95% 
highest posterior density: 1.76 X 10° to 3.23 X 10°) (Supplementary 
Information section 8) for mtDNA, in agreement with a previous 
study’. 


Neanderthal admixture 


The time of admixture between modern humans and Neanderthals has 
previously been estimated to 37,000-86,000 years Bp based on the size 
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Figure 5 | Regions of Neanderthal ancestry on chromosome 12 in the 
Ust’-Ishim individual and fifteen present-day non-Africans. The analysis is 
based on SNPs where African genomes carry the ancestral allele and the 
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of the DNA segments contributed by Neanderthals to present-day non- 
Africans'. Thus, the Ust’-Ishim individual could pre-date the Neander- 
thal admixture. From the extent of sharing of derived alleles between the 
Neanderthal and the Ust’-Ishim genomes we estimate the proportion 
of Neanderthal admixture in the Ust’-Ishim individual to be 2.3 + 0.3% 
(Supplementary Information section 16), similar to present-day east 
Asians (1.7-2.1%) and present-day Europeans (1.6-1.8%). Thus, admix- 
ture with Neanderthals had already occurred by 45,000 years ago. In 
contrast, we fail to detect any contribution from Denisovans, although 
such a contribution exists in present-day people not only in Oceania”*”’, 
but to a lesser extent also in mainland east Asia'*’* (Supplementary 
Information section 17). 

The DNA segments contributed by Neanderthals to the Ust’-Ishim 
individual are expected to be longer than such segments in present- 
day people as the Ust’-Ishim individual lived closer in time to when the 
admixture occurred, so there was less time for the segments to be frag- 
mented by recombination. To test if this is indeed the case, we identified 
putative Neanderthal DNA segments in the Ust’-Ishim and present- 
day genomes based on derived alleles shared with the Neanderthal ge- 
nome at positions where Africans are fixed for ancestral alleles. Figure 5 
shows that fragments of putative Neanderthal origin in the Ust’-Ishim 
individual are substantially longer than those in present-day humans. 
We use the covariance in such derived alleles of putative Neanderthal 
origin across the Ust’-Ishim genome to infer that mean fragment sizes 
in the Ust’-Ishim genome are in the order of ~ 1.8-4.2 times longer than 
in present-day genomes and that the Neanderthal gene flow occurred 
232-430 generations before the Ust’-Ishim individual lived (Supplemen- 
tary Information section 18; Fig. 6). Under the simplifying assumption 
that the gene flow occurred as a single event, and assuming a generation 
time of 29 years'*”*, we estimate that the admixture between the ances- 
tors of the Ust’-Ishim individual and Neanderthals occurred approxi- 
mately 50,000 to 60,000 years Bp, which is close to the time of the major 
expansion of modern humans out of Africa and the Middle East. How- 
ever, we also note that the presence of some longer fragments (Fig. 5) may 
indicate that additional admixture occurred even later. Nevertheless, 
these results suggest that the bulk of the Neanderthal contribution to 
present-day people outside Africa does not go back to mixture between 
Neanderthals and the anatomically modern humans who lived in the 
Middle East at earlier times; for example, the modern humans whose 
remains have been found at Skhul and Qafzeh?*”’. 


An Initial Upper Paleolithic individual? 

A common model for the modern human colonization of Asia**”* assumes 
that an early coastal migration gave rise to the present-day people of 
Oceania, while a later more northern migration gave rise to Europeans 
and mainland Asians. The fact that the 45,000-year-old individual from 
Siberia is not more closely related to the Onge from the Andaman 


Neanderthal genome carries the derived allele. Homozygous ancestral alleles 
are black, heterozygous derived alleles yellow, and homozygous derived alleles 
blue. 
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Figure 6 | Dating the Neandertal admixture in Ust’-Ishim and present-day 
non-Africans. Exponentially fitted curves showing the decay of pairwise 
covariance for variable positions where Africans carry ancestral alleles and the 
Neanderthal genome carries derived alleles. 


Islands (putative descendants of an early coastal migration) than he is 
to present-day East Asians or Native Americans (putative descendants 
of a northern migration) (Fig. 3) shows that at least one other group to 
which the ancestors of the Ust’-Ishim individual belonged colonized 
Asia before 45,000 years ago. Interestingly, the Ust’-Ishim individual 
probably lived during a warm period (Greenland Interstadial 12) that 
has been proposed to be a time of expansion of modern humans into 
Europe”’*°. However, the latter hypothesis is based only on the appear- 
ance of the so-called ‘Initial Upper Paleolithic’ industries (Supplemen- 
tary Information section 5), and not on the identification of modern 
human remains*'”. It is possible that the Ust’-Ishim individual was asso- 
ciated with the Asian variant of Initial Upper Paleolithic industry, doc- 
umented at sites such as Kara-Bom in the Altai Mountains at about 
47,000 years Bp. This individual would then represent an early modern 
human radiation into Europe and Central Asia that may have failed to 
leave descendants among present-day populations”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


All sequencing was performed on the Illumina HiSeq 2000 and base-calling was 
carried out using Ibis 1.1.6 9 (ref. 35). Reads were merged and remaining adaptor 
sequences trimmed before being aligned to the Human reference genome (GRCh37/ 
1000 Genomes) using BWA (version 0.5.10)°°. GATK version 1.3 (v1.3-14-g348f2b) 
was used to produce genotype calls for each site. We excluded from analysis tan- 
dem repeats and regions of the genome that are not unique. We considered only 


genomic regions that fall within the 95% coverage distribution (Supplementary In- 
formation section 7) and where at least 99% of overlapping 35mers covering a pos- 
ition map uniquely, allowing one mismatch. 


35. Kircher, M., Stenzel, U. & Kelso, J. Improved base calling for the Illumina Genome 
Analyzer using machine learning strategies. Genome Biol. 10, R83 (2009). 

36. Li, H.& Durbin, R. Fast and accurate short read alignment with Burrows—Wheeler 
transform. Bioinformatics 25, 1754-1760 (2009). 
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Pulmonary macrophage transplantation 


therapy 


Takuji Suzuki!, Paritha Arumugam”, Takuro Sakagami’, Nico Lachmann’, Claudia Chalk’, Anthony Sallese', Shuichi Abe!, 
Cole Trapnell*°, Brenna Carey', Thomas Moritz?, Punam Malik’, Carolyn Lutzko”, Robert E. Wood® & Bruce C. Trapnell)°” 


Bone-marrow transplantation is an effective cell therapy but requires myeloablation, which increases infection risk and 
mortality. Recent lineage-tracing studies documenting that resident macrophage populations self- maintain independently 
of haematological progenitors prompted us to consider organ-targeted, cell-specific therapy. Here, using granulocyte- 
macrophage colony-stimulating factor (GM-CSF) receptor-f-deficient (Csf2rb~'~) mice that develop a myeloid cell dis- 
order identical to hereditary pulmonary alveolar proteinosis (hPAP) in children with CSF2RA or CSF2RB mutations, we 
show that pulmonary macrophage transplantation (PMT) of either wild-type or Csf2rb-gene-corrected macrophages 
without myeloablation was safe and well-tolerated and that one administration corrected the lung disease, secondary 
systemic manifestations and normalized disease-related biomarkers, and prevented disease-specific mortality. PMT- 
derived alveolar macrophages persisted for at least one year as did therapeutic effects. Our findings identify mechanisms 
regulating alveolar macrophage population size in health and disease, indicate that GM-CSF is required for phenotypic 
determination of alveolar macrophages, and support translation of PMTas the first specific therapy for children with hPAP. 


Mutations in CSF2RA or CSF2RB, encoding GM-CSF receptor o or B, 
respectively, cause hPAP by impairing GM-CSF-dependent surfactant 
clearance by alveolar macrophages, resulting in progressive surfactant 
accumulation in alveoli and hypoxaemic respiratory failure’. Surfac- 
tant normally comprises a thin phospholipid/protein layer reducing ten- 
sion on the alveolar surface® that is maintained by balanced secretion by 
alveolar type II epithelial cells and clearance by these cells and alveolar 
macrophages”*. PAP also occurs in people with GM-CSF autoantibodies 
(~85-90% of all patients with PAP)*”° and mice with disruption of the 
GM-CSF gene Csf2 (refs 11, 12) or the GM-CSF receptor f-subunit gene 
Csf2rb (refs 13, 14) (Csf2”'~ or Csf2rb~'~ mice, respectively). Charac- 
teristics of PAP caused by disruption of GM-CSF signalling include 
typical lung histopathology (well preserved alveoli filled with surfactant 
and ‘foamy’ macrophages staining positive with periodic acid-Schiff 
(PAS) or oil red O); turbid, ‘milky’ appearing bronchoalveolar lavage 
(BAL) caused by accumulated surfactant and cell debris; and a disease- 
specific pattern of biomarkers (increased GM-CSF (hPAP), M-CSF/CSF1 
and MCP-1 in BAL fluid, and reduced mRNA for PU.1, PPARG and 
ABCGI1 in alveolar macrophages)’*'*°. 

Currently, no pharmacological therapy of hPAP exists and surfactant 
must be removed by whole-lung lavage, an inefficient, invasive procedure 
to physically remove excess surfactant~*. In Csf2rb-‘~ mice, PAP was 
corrected by bone marrow transplantation (BMT) of wild-type (WT)”" 
or Csf2rb-gene-corrected Csf2rb~'~ haematopoietic stem/progenitor 
cells (HSPCs). However, in humans this approach resulted in death from 
infection before engraftment’, probably as a result of required myeloabla- 
tion/immunosuppressive therapy. Since pulmonary GM-CSF is increased 
in hPAP'® we hypothesized that macrophages administered directly 
into the lungs (pulmonary macrophage transplantation or PMT) with- 
out myeloablation would engraft and reverse the manifestations of hPAP. 

Wefirst validated Csf2rb‘~ mice as a model of human hPAP by dem- 
onstrating that they had the same clinical, physiological, histopathological 


and biochemical abnormalities, disease biomarkers and natural history 
(Fig. 1 and Extended Data Fig. 1) as children with hPAP?. 


Characterization of macrophages before PMT 
Bone-marrow-derived macrophages (BMDMs) from WT mice had mor- 
phology and phenotypic markers (F4/ 807, CD11b™, CD11¢c*, CD14*, 
CD16/32*, CD64*, CD68", CD115*, CD131*, SiglecF’°”, MerTK*, 
MHCadlass II", Ly6G ,CD3~,CD197 ) of macrophages (Extended Data 
Fig. 2a-c) and contained <0.0125% lineage negative (Lin” ) Scal *c- 
Kit* (LSK) cells. Clonogenic analysis indicated <0.005% colony-forming 
units-granulocyte, monocyte/macrophage (CFU-GM) and no burst- 
forming units-erythrocyte (BFU-E) or colony-forming units-granulocyte, 
erythrocyte, monocyte/macrophage, megakaryocyte (CFU-GEMM) pro- 
genitors (Extended Data Fig. 2d, e). Functional evaluation”’ showed that 
the BMDMs could clear surfactant (Extended Data Fig. 2f, g). These 
results demonstrated that the cells used for PMT were highly purified, 
mature macrophages capable of surfactant clearance. 


Efficacy of PMT of WT macrophages 

To determine the therapeutic potential of PMT, Csf2rb-/~ mice received 
WT (Csf2rb*/*) BMDMs by PMT once (Fig. 1a). One year later, PMT- 
derived CD131* BAL cells were present (Fig. 1b), alveolar macrophages 
expressed Csf2rb (Extended Data Fig. 3a), and BAL was markedly improved 
with respect to opacification (Fig. 1c), sediment (Fig. 1c) and microscopic 
cytopathology (Extended Data Fig. 3b). Importantly, PMT nearly com- 
pletely resolved the abnormal pulmonary histopathology (Fig. 1d and 
Extended Data Fig. 3c). Measurement of BAL turbidity and surfactant 
protein-D (SP-D) content (Fig. le), which reflect the extent of surfactant 
accumulation across the entire lung surface, confirmed the improvement 
in hPAP. BAL fluid biomarkers of hPAP were also improved (Fig. 1f). 
The effects of PMT were evident early, as demonstrated by detection of 
CD131° alveolar macrophages with Csf2rb mRNA and protein (not 


1Division of Pulmonary Biology, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, USA. “Division of Experimental Hematology, Cincinnati 
Children’s Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, USA. °RG Reprograming and Gene Therapy, Institute of Experimental Hematology, Hannover Medical School, Carl 
Neuberg-Str. 1, 30625 Hannover, Germany. “Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA. °Broad Institute of Massachusetts Institute of 
Technology and Harvard University, Cambridge, Massachusetts 02138, USA. Division of Pulmonary Medicine, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, 
USA. “Division of Pulmonary, Critical Care, and Sleep Medicine, University of Cincinnati Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, USA. 


450 | NATURE | VOL 514 | 23 OCTOBER 2014 


©2014 Macmillan Publishers Limited. All rights reserved 


KO + PMT 


- = + JPMT{ 
wr KO 


@ 4Turbidity 57 


mg mi BAL 


oa 


Pparg 


=h 
Q 
a 
ie} 
7) 
cal 
a 
3 
1 


MCP-1 


ee 7 


ee tok 


pg mi? BAL 
Oo 
Lk D ao 
o fo oOo 
Po 
3 
i=} Oo 
1 
nN o 
i=} o fo 
666 
Relative mRNA level @ 


s 
8 
] 


KO + PMT 


Survival = 


T T T 
0 200 400 600 800 
Age (days) 


Hb (g di’) 
I | | 
* 
|* 
Het (%) 
= wo — 
Oo oa o a 
i i | 
——_ 
|* 
Epo (pg x 10 mr’) 


Figure 1 | Therapeutic efficacy of PMT in Csf2rb~'~ mice. a, Schematic of 
the method used. WT HSPCs (1) were isolated, expanded (2), differentiated 
into macrophages (3), and administered by endotracheal instillation into 
2-month-old Csf2rb' ~ (KO) mice (4) and evaluated after 2 months (2M) 
(e-g) or one year (1Y) (b-h) with age-matched, untreated WT or Csf2rb! a 
mice (KO+PMT, WT or KO, respectively). b, CD131-immunostained BAL 
cells. c, Appearance of BAL fluid (left) or sediment (right). d, Lung histology 
after staining with haematoxylin and eosin (H&E), PAS, Masson’s trichrome 
(MT), or surfactant protein B (SP-B). Scale bar, 100 jum; inset, 50 um. 

e, BAL turbidity and SP-D concentration. f, BAL biomarkers. g, Alveolar 
macrophage biomarkers. h, Effects of PMT on blood haemoglobin (Hb), 
haematocrit (Hct) and serum erythropoietin (Epo). i, Kaplan-Meier analysis of 
PMT-treated (n = 43) and untreated Csf2rb! ~ mice (n = 48). Images are 
representative of 6 mice per group (b-d). Numeric data are mean ~ s.e.m. of 
7 (2M) or 6 (1Y) mice per group. *P < 0.05, **P < 0.01, ***P < 0.001, 
****D < 0.0001. 


shown), reduced BAL opacification and cytopathology (not shown), BAL 
turbidity (Fig. le), SP-D (Fig. le) and BAL fluid biomarkers (Fig. 1f) 
2 months after PMT, and reduced lung histopathology 4 months after 
PMT (not shown). In contrast, PMT of Csf2rb ~ BMDMs had no effect 
on BAL turbidity, SP-D content, or BAL fluid biomarkers (not shown), 
demonstrating that GM-CSF receptors on transplanted macrophages 
are important for the therapeutic effects. 

To evaluate the effects of PMT on the alveolar macrophage popula- 
tion, we measured cellular biomarkers after PMT. Results showed that 
alveolar macrophages from PMT-treated Csf2rb'~ mice had increased 
mRNA for PU.1, Pparg and Abcg1, improvement was significant by 
2 months, and the effects persisted 1 year after PMT (Fig. 1g). 

Since Csf2rb"'~ mice develop polycythaemia, a secondary conse- 
quence of hypoxaemia in chronic lung diseases*, we evaluated the 
effects of PMT on this systemic clinical manifestation. Notably, PMT 
corrected polycythaemia in Csf2rb-‘~ mice (Fig. 1h). 

Finally, we evaluated the effects of PMT on hPAP-associated mortality 
by comparing the survival of PMT-treated and untreated Csf2rb-/~ 
mice. PMT increased the lifespan of Csf2rb-/~ mice by 107 days, from 
555 (median; interquartile range 507-592) days to 662 (604-692) days 
(Fig. 1i). In separate studies of treated Csf2rb‘~ mice surviving to 617 
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(604-631) days (561 (548-575) days after PMT of WIT BMDMs), CD131 + 
alveolar macrophages were still present and BAL turbidity remained 
low compared to untreated Csf2rb' ~ mice that survived to 631 (631-631) 
days (optical density at 600 nm (OD¢o0) = 0.75 + 0.17 versus 2.63 + 0.44; 
n = 8,4, respectively; P< 0.001). However, such long-term evaluation of 
laboratory abnormalities is obfuscated by reduced survival of untreated 
Csf2rb~'~ mice. 

These results demonstrate that PMT had a highly efficacious and 
durable therapeutic effect on the primary pulmonary and secondary sys- 
temic manifestations of hPAP in Csf2rb"'~ mice. 


Macrophage engraftment efficiency 


We next evaluated the effects of cell dose (0.5, 1, 2 and 4 million) and 
repeated administration (one versus four monthly transplantations) 
on PMT efficacy (Extended Data Tables 2 and 3, respectively). Neither 
treatment significantly affected efficacy in the range evaluated, and one 
dose of 2 million cells was used for PMT in the remaining studies. 
To determine whether WT macrophages had a survival advantage 
over Csf2rb'~ macrophages, we measured GM-CSF bioactivity in BAL 
fluid and found that it was detectable in Csf2rb~‘~ but not WT mice 
(Extended Data Fig. 1h). WT macrophages had increased survival/pro- 
liferation compared to Csf2rb-‘~ macrophages in vitro (Fig, 2a) and 
accumulated to greater numbers after PMT in Csf2rb-/~ mice than in 
WT mice (Fig. 2b and Extended Data Fig. 3d). PMT of WT Lys-M%P 
knock-in mouse? BMDMs into Csf2rb~'~ mice followed by Ki67 immu- 
nostaining revealed that PMT-derived cells replicated in vivo (Extended 
Data Fig. 3e-g). The percentage of Ki67* PMT-derived alveolar macro- 
phages was 32.2 + 6.05% 1 month after PMT and declined to 11.29 + 2.2% 
by 1 year (Fig. 2c) similar to baseline Ki67* immunostaining of alve- 
olar macrophages in age-matched, normal WT mice (Extended Data 
Fig. 3f). To define this survival advantage further, we evaluated the 
engraftment kinetics after one PMT of WT BMDMsin Csf2rb‘~ mice. 
CD131* cells increased steadily from zero to 69.0 + 2.5% of BAL cells 
(Fig. 2d) synchronous with a smooth decline in pulmonary GM-CSF to 
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Figure 2 | Pharmacokinetics and pharmacodynamics of PMT in Csf2rb7'~ 
mice. a, Competitive proliferation of WT and Csf2rb-/~ BMDMs co-cultured 
with GM-CSF and M-CSF (n = 3 plates per point). b, Quantification of 
GFP* BAL cells 2 months after PMT of Lys-M@"? BMDMs into WT (n = 3) or 
Csf2rb~'~ (n = 6) mice. c, Quantification of Ki67* Lys-M“"? cells in Csf2rb/~ 
mice (n = 3) 1 or 12 months after PMT. d-f, Csf2rb! ~ mice received PMT 
of WT BMDMs and were evaluated at the indicated times to quantify CD131* 
BAL cells (d), BAL GM-CSF concentration (e) and BAL turbidity (f). 
Exponential regression (+ prediction bands), R” = 0.943 (d), R? = 0.819 (e), 
R? = 0.958 (f). Data are mean + s.e.m. for 3-7 mice per group. g, Csf2rb mRNA 
in BAL cells from Csf2rb"'~ mice 1 year after PMT, or untreated, age-matched 
control mice (n = 6). h, Number of BAL cells (open bars) or CD131* 
alveolar macrophages (filled bars) in Csf2rb~/~ mice 1 year after PMT (n = 5) 
or untreated WT mice (n = 10). Data are mean + s.e.m. *P < 0.05, 

***P < 0.001; NS, not significant. 
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near normal (Fig. 2e). Similarly, BAL turbidity declined with the increase 
in CD131° alveolar macrophages (Fig. 2f). One year after PMT, CD131~ 
cells were present (Fig. 1b), CD131 protein (encoded by Csf2rb) was 
detectable in alveolar macrophages (Extended Data Fig. 3a), and Csf2rb 
mRNA in BAL cells from PMT-treated Csf2rb'~ mice was only slightly 
less than in WT and undetectable in untreated Csf2rb-'~ BAL cells 
(Fig. 2g). Importantly, numbers of CD131* alveolar macrophages in 
PMT-treated Csf2rb’~ and untreated WT mice were similar 1 year after 
PMT (Fig. 2h). These results demonstrate that WT macrophages had 
a selective survival advantage over Csf2rb-‘~ macrophages and that 
after PMT into Csf2rb-‘~ mice, they proliferated in vivo at a rate that 
slowed over time synchronous wh reduction in pulmonary GM-CSF, 

replaced dysfunctional Csfarb- ~ alveolar macrophages, and resulted 
in numbers of CD131*, GM-CSF-responsive alveolar macrophages 
similar to WT mice. 


Macrophage characterization after PMT 


The fate of macrophages after PMT was evaluated to determine their spa- 
tial distribution, phenotype and gene expression profile. Intra-pulmonary 
localization was evaluated 1 year after PMT of WT Lys-M°"” BMDMs by 
fluorescence microscopy to identify CD68 * GFP* (that is, PMT-derived) 
macrophages, which revealed that 88.9 + 0.87% were intra-alveolar and 
11.1 + 0.87% were interstitial (Extended Data Fig. 3h). GFP immuno- 
histochemical staining was done to eliminate potential interference from 
autofluorescence and confirmed these results; 90.5 + 1.1% PMT-derived 
macrophages were intra-alveolar and 9.4 + 1.1% were interstitial (Fig. 3a, b 
and Extended Data Fig. 3i). Localization was done in similarly treated 
mice by flow cytometry to detect GFP* cells 2 months (not shown) or 
1 year after PMT (Fig 3c and Extended Data Fig. 4a, b) and by PCR 
amplification of Lys-M° transgene-specific DNA (Extended Data 
Fig. 4c), all of which showed that PMT-derived cells were present in 
the lungs but not detected i in blood, bone marrow, or spleen. One year 
after PMT of CD45.1° WT BMDMs } into CD45.2* Csf2rb~' mice, 
flow cytometric detection of CD45.1* cells confirmed these findings 
(Extended Data Fig. 4e-g). Results show that the transplanted macro- 
phages remained in the lungs, primarily within the intra-alveolar space. 
The effects of the lung environment on the phenotype of transplanted 
macrophages were evaluated by measuring cell-surface markers. One 
year after PMT of WT Lys-M*” BMDMs into Csf2rb-/~ mice, PMT- 
derived alveolar macrophages comprised 68.7 + 6.5% of BAL cells and 
had converted from CD11b"'SiglecF’®” to CD11b'°SiglecF™, sim- 
ilar to the phenotype of WT alveolar macrophages and different from 
Csfarb~/ ~ mice at the point of PMT (CD1 Ib'"'SiglecF’*™) (Fig. 3c, d). 
Similarly, one year after] PMT of WT CD45.1° BMDMs into CD45.2° 
Csf2rb'~ mice, CD45.1* alveolar macrophages comprised 63.6 + 12.1% 
CD11c MHC - II 
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Figure 3 | Localization and phenotype of transplanted macrophages. 
Lys-M“¥ BMDMs were transplanted into Csf2rb-'~ mice and evaluated after 
1 year. a, Immunostained lung showing GEP* cells. Scale bars, left, 200 um; 
right, 20 jum. b, Localization of GFPt macrophages to intra-alveolar (A) and 
interstitial (I) spaces (n = 6). ¢, GFP* BAL cells identified by flow cytometry. 
d, Phenotypic analysis of F4/80* BMDMs before PMT, and alveolar 
macrophages from PMT-treated Csf2rb-'~ mice, or untreated, age-matched 
WT or Csf2rb-'~ mice (n = 6 per group). Data are mean + s.e.m. 
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of BAL cells and had undergone the same phenotypic conversion (Ex- 
tended Data Fig. 4h). 

To determine the effects on gene expression, we performed genome- 
wide expression profiling on alveolar macrophages from Csf2rb_‘~ mice 
1 year after PMT of WT BviDMe and compared to results for untreated, 
age-matched WT or Csf2rb"'~ mice. Unsupervised analysis indicated 
marked co- clustering between PMT-treated Csf2rb~'~ and WT mice 
while Csf2rb‘~ mice clustered separately (Fig. 4). Expression of genes 
regulated by GM-CSF was reduced in Csf2rb ‘~ mice and restored by 
PMT (Fig. 4 and Extended Data Fig. 5a). Of 776 genes for which expres- 
sion was disrupted in Csf2rb‘~ mice, PMT normalized expression of 
600 including 80% of genes upregulated and 76% of genes downregu- 
lated in Csf2rb~'~ compared to WT mice (Extended Data Fig. 5b). 
Supervised Gene Ontology (GO) and detailed KEGG pathway analysis 
revealed that genes in multiple pathways involved in lipid metabolism, 
cellular proliferation, apoptosis and host defence were coordinately down- 
regulated in Csf2rb~'~ mice, and normalized by PMT (Extended Data 
Fig. 5c, d). Results for multiple genes important in lipid metabolism 
(Abcg1, Nrih3, Olr1, Lepr, Fabp1, Lipf, Abcal, Apoe, Apoc2, Pla2g7) 
were validated using separate samples (Extended Data Fig. 5e). 


Efficacy of gene therapy by PMT 

Since PMT in humans would probably employ autologous, gene -corrected 
HSPC-derived macrophages, we evaluated PMT of Csf2rb~'~ macro- 
phages derived from LSK cells after lentiviral vector (LV)-mediated Csf2rb 
cDNA expression (Fig. 5a). Csf2rb gene-corrected (SMe R-LV-transduced) 
or sham-treated (GFP-LV-transduced) Csf2rb™! ~ and non-transduced 
WT LSK-derived cells all had macrophage morphology, expressed CD68 
(Extended Data Fig. 6a) and were F4/ 80° CD11b™CD11c* (not shown). 
In contrast, only WT and GM-R-LV-transduced Csf2rb~/~ cells were 
CD131* and only lentiviral-vector-transduced cells were GEP* (Extended 
Data Fig. 6a). GM-R-LV restored GM-CSF signalling in Csf2rb-/~ mac- 
rophages (Fig. 5b and Extended Data Fig. 6b). Two months after PMT 
into Csf2rb "~ mice, GM-CSF receptor-B was detected on alveolar mac- 
rophages only from mice receiving gene-corrected Csf2rb'~ or WT 
macrophages (Extended Data Fig. 6c). The efficacy using gene-corrected 
Csf2rb~'~ or WT cells was equivalent as demonstrated by a similar 
degree of improvement in BAL appearance (Extended Data Fig. 6d), 
BAL turbidity, SP-D and biomarkers of hPAP (Fig. 5c, d). Furthermore, 
gene-corrected BMDMs localized to the lung (Extended Data Fig. 4d 
and Fig. 6e) and underwent phenotypic conversion to CD11b"™ (Ex- 
tended Data Fig. 6f). The long-term efficacy of gene-corrected macro- 
phages 1 year after PMT was demonstrated by marked reduction in BAL 
turbidity, SP-D and BAL fluid biomarkers of hPAP (Fig. 5c, d). These 
results demonstrate that PMT of gene-corrected macrophages had a 
therapeutic effect on hPAP in Csf2rb‘~ mice equivalent to that of WT 
macrophages and was durable, lasting at least 1 year. 


Safety of PMT therapy in Csf2rb~'~ mice 


PMT was well tolerated and without adverse effects. One year after PMT, 
there were no haematological abnormalities (Extended Data Table 4), 
cellular inflammation or pulmonary fibrosis in mice receiving PMT of 
WT (Fig. 1d) or gene-corrected macrophages (not shown). Csf2rb~ tS 
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Figure 4 | Microarray analysis of alveolar macrophages 1 year after PMT. 
Unsupervised hierarchical clustering dendrogram and heat map of selected 
GM-CSF-regulated genes in PMT-treated Csf2rb~'~ mice or untreated, 
age-matched WT or Csf2rb~'~ mice (3 per group). Pearson correlation 
coefficient (PCC). 
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Figure 5 | Effects of PMT of gene-corrected macrophages on hPAP severity 
and biomarkers. Csf2rb_'~ mice received PMT of non-transduced WT or 
lentiviral-vector-transduced Csf2rb‘~ macrophages and were evaluated 
after 2 months (2M) or 1 year (1Y) (with untreated, age-matched Csf2rb' = 
mice). The key indicates PMT cells used, previous lentiviral vector treatment, 
and time after PMT analysis was performed. a, Lentiviral vector schematics. 
b, GM-CSF signalling measured by the STATS phosphorylation index 
(STAT5-PI) in the indicated cells before PMT. c, BAL turbidity and SP-D 
concentration. d, BAL biomarkers. Mean + s.e.m. of n = 3 (b) or 5-10 (c, d) 
mice per group. *P < 0.05, **P< 0.01. 


mice had trivial elevations of IL-6 and TNF-a in BAL that were reduced 
by PMT of WT macrophages (Extended Data Table 4). These data iden- 
tify no safety concerns for PMT therapy of hPAP in Csf2rb~'~ mice. 


Discussion 


Multiple lines of evidence indicate that the high efficacy of PMT ther- 
apy of hPAP in Csf2rb-/~ mice was attributable to a selective survival 
advantage conferred by increased pulmonary GM-CSF to alveolar mac- 
rophages bearing functional GM-CSF receptors. However, pulmonary 
surfactant remained slightly increased 1 year after a single PMT. This 
could be because the treatment time was too short or exceeded the dura- 
bility of the clinical benefit, or due to the continued presence of Csf2rb/~ 
alveolar macrophages despite engraftment of GM-CSF-responsive mac- 
rophages. The latter is likely due to ongoing Csf2rb-'~ myelopoiesis, 
pulmonary recruitment of monocytes and local proliferation, and GM- 
CSF-independent survival as occurs in untreated Csf2rb_'~ mice. Csf2rb'~ 
macrophages may provide a ‘protected intracellular niche’ for surfac- 
tant accumulation since without GM-CSF, alveolar macrophages inter- 
nalize but cannot clear surfactant’®. Another factor may be reduction of 
the survival advantage over time, that is, reduced pulmonary GM-CSF 
(driving WT cell proliferation) and reduced surfactant burden (driving 
surfactant-engorgement-related Csf2rb ‘~ cell death). Notwithstanding 
these points, a single PMT of GM-CSF responsive cells cleared ~90% 
of the abnormal surfactant accumulation for at least one year. 

The feasibility of translating PMT therapy to humans with hPAP is 
supported by the safety and efficacy of PMT in Csf2rb"/~ mice and the 
striking similarity of hPAP in mice and humans. Macrophages could 
be delivered by bronchoscopic instillation without endotracheal intuba- 
tion, general anaesthesia, or mechanical ventilation, which are required 
for whole lung lavage and increase risk. Preparative myeloablation would 
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Figure 6 | Proposed homeostatic reciprocal feedback mechanism by which 
pulmonary GM-CSF regulates alveolar macrophage population size in vivo. 


ARTICLE 


be unnecessary and use of autologous, gene-corrected cells would elim- 
inate the need for immunosuppression, which are required for BMT. 
PMT may also be possible with gene-corrected, inducible pluripotent- 
cell-derived macrophages recently prepared from children with hPAP”*””. 
However, formal preclinical toxicology studies related to PMT and to 
gene transfer will be needed before this approach can be tested in humans. 
Since pulmonary GM-CSF is critical to lung host defence and clearance 
ofa broad range of microorganisms”, PMT may also be useful in treat- 
ing serious lung infections. Indeed, pulmonary administration of mac- 
rophages constitutively expressing IFN-y improved host defence in SCID 
mice”. In such applications, inhaled GM-CSF could be used to promote 
survival of transplanted macrophages”. 

Identification of a homeostatic mechanism by which pulmonary 
GM-CSF regulates alveolar macrophage population size (Fig. 6) was an 
unexpected but important finding. Its existence is supported by recent 
fate-mapping studies indicating that tissue-resident alveolar macrophages 
derive before birth and self-maintain by local replication independent 
of circulating monocytes at steady state’. 

The concept of alveolar macrophages (and other tissue-resident mac- 
rophages) as short-lived, terminally differentiated, non-dividing repre- 
sentatives of a unified mononuclear phagocyte system replenished via 
monocyte intermediates has evolved considerably since its inception™. 
Alveolar macrophage half-life was initially estimated at 2 weeks based 
on studies of repopulation after lethal irradiation and allogeneic BMT”. 
Improved detection methods using GFP” cells increased the estimate 
to 30 days”*. Shielding the thorax during irradiation increased it further 
to 8 months*’. Our data, obtained without irradiation or myeloablation, 
show that macrophages transplanted directly into the respiratory tract 
persisted for one-and-a-half years. A caveat of such estimates is their 
inability to discern if persistence is due to prolonged survival or replication. 

Normally, alveolar macrophages are phenotypically CD11b’*"SiglecF™ 
while other macrophage populations are CD11b'"SiglecF’*”. Surpris- 
ingly, WT BMDMs cultured in GM-CSF and M-CSF were CD11b'™ 
SiglecF'©” in vitro but converted to CD11b"°’SiglecF™ after PMT. In 
contrast, BMDMs instilled in the peritoneum adopt the CD11b™ phe- 
notype of peritoneal macrophages*’. These changes agree with gene- 
expression profiling studies” and indicate that local microenvironments 
provide critical ‘phenotypically instructive’ cues that direct development 
of tissue-resident macrophage populations. Our results show for alveolar 
macrophages that GM-CSF provides one such phenotypic cue while the 
lung environment provides another critical, albeit unidentified, cue. 

The limitations of our study include the fact that it did not establish 
a minimum effective dose, a maximum tolerated dose, or a significant 
dose-response relationship. BMDMs were capable of clearing surfactant 
before transplantation but results did not determine whether ‘lung- 
conditioning’ further increased their clearance capacity. While the macro- 
phages used for PMT contained very few progenitors, it is theoretically 
possible that clonal expansion of a progenitor subpopulation may have 
contributed to therapeutic efficacy and, if so, potential clonal shrinkage 
may have contributed to loss of benefit at later times. Thus, additional 
studies are needed to further confirm the identity of effector cells and 
precise pharmacokinetics and durability of the therapeutic benefit. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Mice. All mice were bred, housed and studied in the Cincinnati Children’s Research 
Foundation Vivarium using protocols approved by the Institutional Animal Care 
and Use Committee. Csf2rb gene-targeted (Csf2rb'~) mice", and mice expressing 
EGFP knocked into the lysozyme M gene (Lys-M“” mice)”®, were all generated 
previously and backcrossed onto the C57BL/6 background. C57BL/6 mice (referred 
to as wild type or WT mice) were purchased from Charles River. B6.SJL-Ptprc* 
Pepc’/Boy] (CD45.1*) mice were from Jackson Laboratory. 

Lung histology and immunohistochemistry. Animals were killed by intraper- 
itoneal pentobarbital administration and exsanguination by aortic transection. The 
trachea was exposed by a vertical midline skin incision, cannulated through a small 
transverse incision in its ventral surface away from the thoracic inlet, inflated with 
fixative (PBS, pH 7.4, containing 4% paraformaldehyde) under a hydrostatic head 
of 25 cm and ligated with suture while retracting the cannula to seal the lung under 
pressure. The sternum and diaphragm were transected sagittally, retracted laterally, 
and the lungs and heart separated from the chest wall by blunt dissection to avoid 
puncturing the mediastinal pleura and removed from the chest. The intact tissue 
block containing the heart, lungs and ligated trachea was submerged in fixative and 
kept at 4 °C for 24h. After fixation, the lung lobes were divided, removed from the 
tissue block, cut into ~2-mm-thick slices along the long axis, washed in cold PBS, 
dehydrated, embedded in paraffin, and 5-j1m-thick sections were cut and stained 
with haematoxylin and eosin (H&E), periodic acid-Schiff reagent (PAS), or Masson’s 
trichrome as previously described*’. Immunostaining for surfactant protein B (SP-B) 
was done by incubating slides with rabbit anti-SP-B polyclonal antibody (diluted 
1:500, Seven Hills Bioreagents, Cincinnati, OH) and Vectastain ABC anti-rabbit 
immunohistochemical horseradish peroxidase kit (Vector Labs, Inc., Burlingame, 
CA) and counterstaining with haematoxylin as described*". To prepare frozen lung 
sections, the lungs were inflation fixed in situ as described above and then the heart 
and lungs were removed en bloc and cryoprotected by sequential immersion in PBS 
containing increasing sucrose concentrations (10%, 15% and 20%; 8-12h, 4 °C, 
at each concentration). The lungs were then embedded in Tissue-Tek OCT com- 
pound (Sakura Finetek, Torrance, CA), frozen and stored at —80 °C until use. Serial 
6 jum sections were prepared for immunostaining or evaluation of GFP * cells. Lung 
sections and sedimented lung cells were examined by light microscopy using a Zeiss 
Axioplan 2 microscope (Zeiss) equipped with AxioVision software (Zeiss). 
Collection, handling and evaluation of bronchoalveolar lavage fluid and cells. 
Epithelial lining fluid and non-adherent cells were collected from lung surface of 
mice by bronchoalveolar lavage (BAL) as described” and processed immediately. 
Briefly, five 1-ml aliquots were instilled and immediately recovered per mouse and 
combined resulting in a BAL recovery of 93.9 + 1.2% per mouse (BAL recovery data 
for 10 mice evaluated randomly). The photographs of fresh BAL specimens and the 
specimen after allowing sediment to be formed by overnight incubation at 4 °C were 
taken. The turbidity of BAL was determined as described“. Briefly, after gently 
mixing to ensure a homogeneous suspension of BAL, a 250 ul aliquot was diluted 
into 750 pl PBS and the optical density was measured at a wavelength of 600 nm 
and multiplying the result by the dilution factor. The total number of BAL cells 
recovered from each mouse was determined by counting cells in an aliquot of known 
volume using a haemocytometer and multiplying the result by the total volume of 
BAL and dividing by the volume of the aliquot used for counting. BAL cytology was 
evaluated in aliquots (~50,000 cells) after sedimentation (Cytospin, Shandon, Inc.; 
500 r.p.m., 7 min, room temperature) onto glass slides and staining with DiffQuick, 
PAS, or oil red O (all from Fisher Scientific) as described". The cell differential was 
determined by microscopic examination of DiffQuick stained cells and the total 
number of alveolar macrophages per mouse was determined by multiplying the 
percentage of alveolar macrophages in BAL cells by the total number of BAL cells 
recovered’. BAL fluid and cells were separated by low-speed centrifugation (285g, 
10 min, room temperature) and stored at —80 °C until use (BAL fluid) or imme- 
diately evaluated (referred to as BAL cells) or used to isolate alveolar macrophages 
(see below). Primary alveolar macrophages were purified by brief adherence of BAL 
cells to plastic as described"*. Viability was evaluated by Trypan blue exclusion and 
was =95%, 

ELISA. The concentration of surfactant protein D (SP-D) in BAL fluid was mea- 
sured by enzyme-linked immunosorbent assay (ELISA) as we described*’. The 
concentration of several cytokines (GM-CSF, M-CSF, MCP-1, IL-1, IL-6, TNF-«) 
in BAL fluid and erythropoietin in serum was measured by ELISA (Mouse Quan- 
tikine Kits, R&D Systems) as described’. 

Quantitative RT-PCR. Total RNA was isolated from alveolar macrophages using 
TRIzol Reagent (Life Technologies, Carlsbad, CA) and then used to purify mRNA 
using RNeasy (Qiagen, Valencia, CA), both as directed by the manufacturers. Purified 
mRNA was used to synthesize cDNA using the Invitrogen SuperScript III First- 
Strand Synthesis System (Life Technologies). Standard quantitative RT-PCR (qRT- 
PCR) was performed as previously described’ on an Applied Biosystems 7300 
Real-Time PCR System (Life Technologies) to measure transcript abundance using 
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TaqMan oligonucleotide primer sets (all from Life Technologies) (Extended Data 
Table 1). Expression of target genes was normalized to the expression of 18s RNA. 
Data for each gene were shown as the fold change of the mean of results for wild- 
type mice. 
Bone-marrow-derived macrophages (BMDMs). Bone marrow cells were obtained 
from 6-8-week-old WT, Csf2rb-’~, or Lys-M°"? mice by isolating and flushing 
tibias and femurs with DMEM (Life Technologies) containing 10% heat-inactivated 
FBS, 50 U ml ' penicillin, and 50 ppg ml streptomycin. After red blood cells were 
removed with BD Pharm Lyse (BD Biosciences), mononuclear cells were isolated by 
centrifugation on Ficoll-Paque (GE Healthcare) at room temperature for 30 min, 
washed, re-suspended in DMEM containing 10% heat-inactivated FBS, 50 U ml! 
penicillin, 50 jig ml ' streptomycin, 10 ng ml’ GM-CSF and 5 ng ml * M-CSF (both 
from R&D Systems), seeded into plastic dishes (Falcon) at a density of ~27 x 10° 
cells per 10 cm dish (1 per mouse) and cultured overnight at 37 °C in a humidified 
environment containing 5% CO . The next day, non- or weakly-adherent cells were 
recovered, transferred to a new dish and cultured under the conditions just described 
to permit differentiation and expansion of macrophages; firmly adherent cells were 
discarded. After 2 days the culture medium was changed and after 5 days from seed- 
ing, adherent bone-marrow-derived macrophages were gently washed with PBS, 
harvested by brief exposure to trypsin-EDTA (Life Technologies), washed, and 
used for experiments. The cell purity was high as indicated by the percentage of 
CD68" and F4/80* cells (96.6 + 0.3%, 95.4 + 1.3%, respectively, not shown). 
Some experiments used lineage-negative (Lin) c-Kit *Sca-1* (LSK) cells which 
were obtained from mouse bone marrowas described“. Briefly, bone marrow from 
6-8-week-old WT or Csf2rb"'~ mice was collected as above and lineage depleted 
with biotinylated lineage antibodies CD5 (53-7.3), CD8a (53-6.7), CD45R/B220 
(RA3-6B2), CD11b (M1/70), Gr-1 (RB6-8C5), and TER-119 (TER-119) (BD Bio- 
science), and magnetic beads (Dynabeads sheep anti-rat IgG) (Life-Technologies). 
After removing lineage-positive cells, the remaining cells were stained with 7-ADD, 
FITC-Streptavidin (BD Biosciences) and antibodies to Sca-1 (D7) and c-Kit (2B8) 
(BD Biosciences). Then, Lin’ c-Kit* Sca-1*7-ADD cells were isolated by cell sort- 
ing on a FACSAria (BD Biosciences) and used immediately in experiments. Cell 
morphology was confirmed by DiffQuick Staining of sedimented cells (Cytospin, 
Shandon) and viability was measured by Trypan blue exclusion as described'* and 
found to be =95%. In some experiments, cells were immunostained for CD68 (FA- 
11) (AbD Serotec), counterstained with DAPI as described*', and examined by light 
microscopy using a Zeiss Axioplan 2 microscope (Zeiss) equipped with AxioVision 
software (Zeiss). 
Colony forming cell (CFC) assay. BMDMs or Lin” bone marrow cells were eval- 
uated for the presence of haematopoietic progenitors capable of forming colonies 
in semisolid medium in response to cytokine stimulation as previously described”. 
Briefly, fresh Lin™ bone marrow cells or BMDMs after induced differentiation into 
macrophages for 5 days were seeded into standard mouse methylcellulose media 
supplemented with insulin, transferrin, SCF, IL-3, IL-6 and erythropoietin (HSC007, 
R&D Systems, Minneapolis, MN). After 7 days in culture, colonies of =50 cells were 
visible and were examined morphologically using whole-plate stack images acquired 
using an AXIO-Z1 microscope and AXIO-vision software (Zeiss, Jena, Germany) 
to identify and enumerate burst-forming erythroid progenitors (BFU-E), colony- 
forming myeloid progenitors (CFU-GM) and the multi-potential progenitors 
(CFU-GEMM). 
Surfactant clearance assay. BMDMs were evaluated functionally to demonstrate 
their ability to clear human surfactant as we previously reported*'. Briefly, BMDMs 
from either WT or Csf2rb~'~ mice were seeded into 12-well plates (4 X 10° cells 
per well) in DMEM, 10% FBS, 10ngml* GM-CSF, 5ng ml’ M-CSF. Human 
surfactant recovered by lavage of a patient with PAP was added to the media and 
cells were incubated for 24h to permit surfactant uptake into cells and then washed 
to remove extracellular surfactant. Cells were incubated for 24 h to permit macro- 
phages to clear internalized surfactant. Cells collected before, immediately after 
surfactant exposure or 24h after the completion of surfactant exposure were sedi- 
mented onto slides by cytocentrifugation (Shandon), stained with oil red O, and 
counterstained with haematoxylin. Oil red O staining was evaluated in =10 ran- 
dom 20X microscopic fields for each sample as described*'. 
Pulmonary macrophage transplantation (PMT). BMDMs or LSK cell-derived 
macrophages were administered directly into the lungs of 8-week-old mice using 
a relatively non-invasive endotracheal instillation method described previously**. 
Briefly, mice received light anaesthesia by isoflurane inhalation and were suspended 
on a flat board by a rubber band across the upper incisors and placed in a semi- 
recumbent (45°) position with the ventral surface and rostrum facing upwards. 
Using a curved blade Kelly forceps, the tongue was gently and partially retracted 
rostrally, and 50 yl of PBS containing the macrophages to be administered was 
placed in the back of the oral cavity using a micropipette. The PBS and cells were 
inhaled into the lungs by subsequent respiratory efforts under direct visualization. 
Mice were then observed while recovering from anaesthesia to ensure continued 
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retention of the administered fluid and cells and then returned to their cages for 
routine care and handling. Because the efficacy of PMT at a dose of two million mac- 
rophages was optimal, this dose given as one administration was used throughout 
the study except where noted (Extended Data Tables 2 and 3). Age-matched mice 
were used in all experiments to control for the degree of lung disease severity. 
Flow cytometry. BAL cells were purified by centrifugation on Percoll to remove 
surfactant and debris*’. BAL cells or BMDMs were immunostained to detect CD115 
(AEFS98), F4/80 (BMB8) (eBioscience), CD3 (145-2C11), CD11b (M1/70), CD11¢ (HL3), 
CD16/32 (2.4G2), CD19 (1D3), CD64 (X54-5/7.1.1), CD131 (JOROS0), Ly6G (1A8), 
CD45.1 (A20), CD45.2 (104), MHC class I (I-A/I-E) (M5/114.15.2), SiglecF (E50- 
2440) (BD Biosciences), CD14 (Sa14-2) (BioLegend), CD68 (FA-11) (AbD Serotec), 
and MerTK (108928) (R&D Systems), as previously described’, evaluated by flow 
cytometry using a FACSCaliber or FACSCanto flow cytometers (both from BD 
Biosciences, San Jose, CA), and the results were analysed using CellQuest, FACSDiva 
(BD Biosciences), or FlowJo software (Tree Star). For intracellular staining of CD68, 
Leucoperm (AbD Serotec) was used as directed by the manufacturer. 
Quantification of CD131* alveolar macrophages. The percentage of BAL cells 
expressing the GM-CSF receptor B-subunit was determined by immunostaining. 
Briefly, aliquots of BAL cells were sedimented onto glass slides and incubated (10 min, 
room temperature) in fixative (PBS containing 4% paraformaldehyde), washed with 
PBS and incubated (4 °C, overnight) with anti-mouse GM-CSF receptor-B (CD131) 
antibody (sc-678) (Santa Cruz) diluted 1:400 in PBST (PBS containing 2.5% (w/v) 
Triton X-100 and 5% (v/v) goat serum). After incubation, slides were rinsed five 
times in PBST and incubated (room temperature, 1 h) with the secondary detection 
antibody (Alexa-Fluor-594-conjugated, anti-rabbit IgG (Life Technologies)) and 
counterstained with 4’, 6-diamidino-2-phenylindole dihydrochloride (DAPI) (Vector 
Labs, Burlington, CA). Cells were examined using a Zeiss Axioplan 2 microscope 
(Zeiss) equipped with AxioVision software (Zeiss). The percentage of CD131* 
BAL cells was determined by first counting the CD131* and DAPI" cells in five (or 
more) random 20X microscopic fields for each BAL sample. Then, the number of 
CD131* cells in each field was divided by the number of DAPI™ cells in the same 
field and results for all fields examined were averaged and multiplied by 100. The 
total number of CD131* cells per mouse was calculated by multiplying the percen- 
tage of CD131" cells by the total number of BAL cells recovered from each mouse. 
STAT5 phosphorylation index assay. GM-CSF bioactivity in BAL fluid and GM- 
CSF receptor function in transduced or WT macrophages was evaluated by measur- 
ing GM-CSF-stimulated phosphorylation of STATS in BMDMs or LSK cell-derived 
macrophages using anti-phospho STATS5 antibody (47/Stat5(pY694)) (BD Bio- 
sciences) by flow cytometry as previously reported’. The STAT5 phosphorylation 
index (STAT5-PI) was calculated as the mean fluorescence intensity of phosphor- 
ylated STATS staining in GM-CSF-stimulated cells minus that of non-stimulated 
cells, divided by that of non-stimulated cells, and multiplied by 100. In experi- 
ments to quantify GM-CSF bioactivity, WT BMDMs were incubated in BAL fluid 
containing anti-GM-CSF (22E9, eBioscience) or isotype control antibody (10 pg ml~ 1 
for 30 min and then evaluated. 

Evaluation of macrophage proliferation. In vitro mixed-cell proliferation assay. 
CD45.1* WT LSK-derived cells and CD45.2* Csfarb-'~ LSK-derived cells were 
isolated, seeded into dishes at an initial ratio of 1:3, respectively, and cultured in 
DMEM containing 10% bovine calf serum, 1% penicillin/streptomycin, GM-CSF 
(10 ng ml’) and M-CSF (5 ng ml '). Cells were collected at 1, 7, 14 and 18 days, 
immunostained with anti-murine CD45.1, anti-CD45.2 and evaluated by flow cytom- 
etry to determine the percentage of each cell type at these times. 

In vivo evaluation of transplanted macrophage proliferation. Frozen lung tissue 
sections were immunostained with anti-Ki67 antibody (Roche) and examined using 
a Zeiss Axioplan 2 microscope (Zeiss). The percentage of proliferating PMT-derived 
cells was determined by enumerating GFP* Ki67~ cells among total GFP* cells 
in =7 random 20X microscopic fields for each sample. To confirm the specificity 
of Ki67 immunostaining, paraffin-embedded sections or WT alveolar macrophages 
isolated by BAL and adherence were also stained with Ki67 and examined by light 
microscopy. 

Western blotting. Detection of GM-CSF receptor-f and actin by western blot- 
ting was done as previously described’ with the following modifications. Briefly, 
primary alveolar macrophages (0.5 X 10° per condition) or cultured BMDMs 
(1 X 10° per condition) were collected by low-speed centrifugation (285g, 4 °C, 
10 min) and the pellets incubated on ice for 30 min in 200 ul of lysis buffer (50 mM 
Tris-HCl pH 8.0, 150 mM NaCl, 1% (v/v) nonidet p-40, 0.5% (w/v) sodium deox- 
ycholate, 0.1% (w/v) sodium dodecy] sulphate (SDS), 0.004% (w/v) sodium azide) 
containing 2% (v/v) proteinase inhibitor cocktail (phenyl-methyl-sulphonyl-fluoride 
and sodium orthovanadate; Santa Cruz). Insoluble debris was removed by centri- 
fugation at 10,000g, 4°C, 15 min and the supernatant transferred to a clean poly- 
propylene tube. An equal volume of Laemmli sample loading buffer (Bio-Rad, CA) 
was added and the tubes were capped tightly, vortexed briefly, boiled for 5 min, 
and separated by electrophoresis on SDS- polyacrylamide gradient (4-12%) gels 


(Invitrogen) under reducing conditions. Separated proteins were transferred to 
PVDF membranes by electro-blotting, incubated in blotting solution (50 mM Tris- 
HCl pH 8.0, 150 mM NaCl, 5% (w/v) non-fat dry milk (Kroger, Cincinnati, OH), 
0.1% (v/v) Tween 20; 4 °C, overnight) to block non-specific binding. Diluted primary 
detection antibody (see below) was added and the membranes were incubated for 2h 
at room temperature and then washed in TBST (50 mM Tris-HCl pH 8.0, 150 mM 
NaCl, 0.1% (v/v) Tween 20). Membranes were then incubated with the secondary 
HRP-conjugated detection antibody in blotting solution for 1 h at room temperature 
and then washed as above and then incubated with ECL-Plus (GE Healthcare) as 
directed by the manufacturer. Anti-mouse GM-CSF receptor-B antibody (sc-678) 
(Santa Cruz) diluted 1:500 and anti-actin (sc-1616) (Santa Cruz) diluted 1:1,000 
were used for primary antibodies. 
Haematological analysis. Blood was obtained from the superior vena cava from 
mice and 20 ll was used to measure complete blood counts on a fully automated 
Hemavet 850 (Drew Scientific). Data for the precision and linearity of measure- 
ments made with the Hemavet850 can be found online at http://www.drew- 
scientific.com/product_hemavet850.htm. 
Microarray analysis. Alveolar macrophages were obtained from age-matched 
mice (three per condition) and analysed individually as follows. Total RNA was 
isolated as described above and microarray analysis was performed using the Mouse 
Gene 1.0 ST Array (Affymetrix, Santa Clara, CA) in the CCHMC Affymetrix Core 
using standard procedures as described”. Data (available at Gene Expression Omnibus 
accession GSE60528) were analysed using the Affymetrix package in the R statis- 
tical programming language (Bioconductor; http://www.bioconductor.org). Probes 
were corrected for background using the Microarray Analysis Suite algorithm, quan- 
tile normalized, and probe sets were summarized using the average difference of 
perfect matches only. Differential expression tests were performed using signifi- 
cance analysis of microarrays** with Benjamini-Hochberg correction for multiple 
testing’’. Significant gene lists were selected with a A that constrained the false dis- 
covery rate to less than 10%. Cluster dendrogram was generated from unsuper- 
vised hierarchical clustering analysis of microarray data from probes for all 28,853 
genes represented on the chip (Spearman correlation; 3 mice per group). In Venn 
diagrams, numbers of genes for which expression was altered in alveolar macro- 
phages from Csf2rb~'~ compared to WT mice (WT—>KO) or PMT-treated com- 
pared to untreated Csf2rb-'~ mice (KO>KO+PMT) were shown. Only genes with 
statistically significant changes (false detection rate <10%) of at least twofold were 
marked as increased (up arrows) or decreased (down arrows). The numbers of genes 
for which expression was disrupted in Csf2rb~'~ mice and normalized by PMT (or 
unchanged in both comparisons) is shown in the overlap regions. In gene ontology 
analysis, data show the coordinate increases (red) or decreases (blue) in expression 
of genes in all gene sets significant at or below a false detection rate of 10% calcu- 
lated by the Gene Set Test with correction for multiple testing. 
Lentiviral vectors, LSK-cell transduction, and differentiation and expansion 
of transduced macrophages. Gene transfer vectors were constructed using rou- 
tine methods“ from the vector backbone of (Ery-GFP), a human immunodeficiency 
virus-based, self-inactivating (SIN) lentiviral vector (LV) harbouring a 398-bp U3 
deletion eliminating the strong viral promoter/enhancer element”’. GM-R-LV con- 
tains a chimaeric transgene comprised of the human elongation factor 1-« (ELF-10) 
promoter (a 1,189-bp fragment containing intron 1 ending 20 bp upstream of the 
ATG codon isolated from the pEF-BOS plasmid*') followed by the mouse Csf2rb 
cDNA (nucleotides —80 to 2,691, GenBank accession number M34397.1) located 
3’ of the lentiviral central poly-purine tract and followed by an internal ribosome 
binding site (IRES) and then an enhanced green fluorescent protein (GFP) trans- 
gene (Fig. 5a). GFP-LV is a lentiviral vector of similar design except that the Csf2rb 
and IRES were omitted and the GFP transgene is driven from the ELF1a promoter 
(Fig. 5a). Both vectors contain a viral splice donor site, packaging sequence, splice 
acceptor site, and central polypurine tract (cPPT) 5’ of the ELFlx promoter anda 
woodchuck hepatitis post-transcriptional regulatory element (WPRE) (nucleotides 
1093 to 1684; GenBank accession number J04514)* located 3’ of the GFP stop codon 
as described™. Lentiviral vectors were produced by transient transfection as vesicular 
stomatitis virus-G (VSVG) virions, concentrated, and titred as described“. Csfarb-'~ 
LSK cells were isolated, transduced and expanded as described” except that transduc- 
tions were done at a multiplicity of infection (MOI) of 20 for two 12-h periods, IL-11 
was omitted, GM-CSF (10 ng ml™ ') and M-CSE (5 ng ml ') were included, SCF and 
Fit-3 ligand were sequentially reduced (50, 1, 0ng ml’), and IL-3 was present early. 
Transduction, expansion and differentiation of LSK cells into gene-corrected 
macrophages was done by adjusting the cytokine ‘cocktail’ mixture to optimize 
the culture conditions for each of four sequential stages, which included: (1) LSK 
transduction: murine SCF (R&D) 50 ng ml 1 mIL-3 (PeproTech) 10 ng ml ~ 1 hElt3-L 
(PeproTech) 50 ng ml — land GM-CSF (R&D) 10 ngml- ' culture time of two 12-h 
periods; (2) progenitor expansion: mSCF 50 ng ml ', hFIt3-L50 ng ml‘ and GM- 
CSF 10. ng ml’, culture time of 4 days; (3) macrophage lineage commitment: mSCF 
1ng ml’, hFit3-L 1 ng ml’, GM-CSF 10 ng ml’, M-CSF (R&D) 5 ng ml *, culture 
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time of 3 days; and (4) macrophage differentiation: GM-CSF 10ngml* and 
M-CSF 5 ng ml ’, culture time of 4 days. StemSpan (STEMCELL Technologies) 
containing 2% FBS, 1% penicillin/streptomycin, 10 mM dNTP, and low-density 
lipoprotein was used as the culture medium for the LSK transduction and DMEM 
with 10% FBS, 50 U ml’ penicillin and 50 1g ml streptomycin was used for all 
other stages. Phenotype markers (F4/80, CD11b, CD11c) were analysed by flow 
cytometry at each stage to monitor macrophage differentiation. Only adherent 
macrophages at the end of this procedure were used for PMT. 

Localization of PMT-derived cells after transplantation. Several approaches 
were used to identify and localize PMT-derived cells within the lung parenchyma 
and in different organs. 

Intra-pulmonary localization of PMT-derived cells. CD131 immunostaining 
and fluorescence microscopy or flow cytometry was used to detect and quantify 
transplantation-derived donor macrophages among BAL cells from the lungs of 
Csf2rb-‘~ mice that previously received PMT of WT (C57BL/6) BMDMs, Lys- 
M“" BMDMs, CD45.1* WT BMDMs, or GM-R-GFP-LV Csf2rb gene-corrected 
Csf2rb~'~ LSK-derived macrophages. 

To localize PMT-derived macrophages to intra-alveolar space or interstitium of 
the lung, frozen lung sections from mice that received PMT of Lys-M°'? BMDMs 
1 month or 1 year earlier were immunostained with CD68, counterstained with DAPI 
(Vector Labs) and evaluated by fluorescence microscopy to identify macrophages, 
PMT-derived cells, and nucleated cells, respectively. PMT-derived macrophages 
(that is, GEP* CD68* cells) located within the intra-alveolar space or the intersti- 
tium were then enumerated. To eliminate the possibility of any interference from 
non-specific auto-fluorescence of alveolar macrophages, paraffin-embedded lung 
sections from these mice were immunostained with anti-GFP antibody (Life Tech- 
nologies) and examined by light microscopy to enumerate immunohistochemi- 
cally marked macrophages located within the intra-alveolar space or interstitium. 
Organ-specific localization of PMT-derived cells. In one approach, Csf2rb/~ 
mice received PMT of Lys-MS'? BMDMs and 1 year later, cells isolated from the 
BAL, blood, bone marrow, and spleen were evaluated by flow cytometry to detect 
GFP* cells as a marker for PMT-derived cells. 

In a second approach, CD45.2* Csf2rb~/~ mice received PMT of CD45.1* 
BMDMsand 1 year later, cells isolated from the BAL, blood, bone marrow and spleen 
were evaluated by flow cytometry to detect CD45.1* cells as a marker for PMT- 
derived cells. 

Ina third approach, Csf2rb-/~ mice received PMT of Lys-M“"? BMDMs and 
1 year later, DNA was extracted from the BAL cells (lung), blood leukocytes, bone 
marrow cells, and spleen using a DNeasy Blood & Tissue Kit (Qiagen). Organ- 
specific DNA was subjected to PCR amplification using oligonucleotide primers 
(Extended Data Table 1) specific for the Lys-M° knock-in transgene or the unmod- 
ified Lysozyme M gene to detect PMT-derived and endogenous cells, respectively, 
as previously reported”. 

A fourth approach was conducted using a specific operating procedure (TSL 
6-13) and Good Laboratory Practice (GLP) conditions within the CCHMC Trans- 
lational Core Laboratory. Here, DNA was extracted from the BAL cells (lung), blood 
leukocytes, bone marrow cells, and spleen of Csf2rb~/~ mice that had received PMT 
of GM-R-GFP-LV Csf2rb gene-corrected, Csf2rb'~ LSK-derived macrophages 
1 year earlier and subjected to quantitative PCR amplification with oligonucleo- 
tide primers specific for the R-U5 of GM-R-GFP-LV using Applied Biosystems 
ABI7900HT Fast Real-Time PCR System (Life Technologies). The number of GM- 
R-GFP-LV vector copies per microgram of organ-specific DNA was quantified and 
normalized to the level of mouse apolipoprotein B gene as described previously”. 
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Statistical analysis. Numeric data were evaluated for normality and variance using 
the Shapiro-Wilk and Levene median tests, respectively, and presented as mean + s.e.m. 
(parametric data) or median and interquartile range (nonparametric data). Statis- 
tical comparisons were made with Student’s t-test, one-way analysis of variance, or 
Kruskal-Wallis rank-sum test as appropriate; post-hoc pairwise multiple compar- 
ison procedures were done using the Student-Newman-Keuls or Dunn’s method 
as appropriate. P values of =0.05 were considered to indicate statistical significance. 
Based on the use of BAL turbidity (the primary outcome variable for efficacy) 
measured 2 months after PMT of WT BMDMsinto Csf2rb_/~ mice and compared 
to age-matched, untreated Csf2rb ‘~ mice, 6 mice per group had a power 0.8 to 
detect a difference of 1.4 OD 600 nm using a two-tailed Student’s t-test and a P value 
of 0.05. All studies used male and female mice by randomly assigning mice housed 
in the same cage to separate experimental groups but without formal randomization 
or blinding. Results from all mice were included in the final analysis without exclu- 
sion. Analyses, including Kaplan-Meyer survival analysis, were performed with 
SigmaPlot, Version 12.5 (Systat Software, San Jose, CA). All experiments were repeated 
at least twice, with similar results. 
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Extended Data Figure 1 | Validation of Csf2rb~'~ mice as an authentic 
model of human hPAP. a, Typical lung pathology showing surfactant-filled 
alveoli with well-preserved septa in a child homozygous for CSF2RB*’'* 
mutations and identical pulmonary histopathology in a Csf2rb-'~ mouse. PAS 
stain. Scale bar, 100 tm. b, Photographs of ‘milky’-appearing BAL from a 
14-month-old Csf2rb~'~ mouse and normal-appearing BAL from an 
age-matched WT mouse (representative of n = 6 mice per group). c, Increased 
BAL turbidity and SP-D concentration in 4-month-old Csf2rb‘~ mice 
compared to age-matched WT mice. d, BAL fluid biomarkers of hPAP 
(GM-CSF, M-CSF and MCP-1) are increased in 4-month-old Csf2rb-/~ mice 
compared to age-matched WT mice. e, Alveolar macrophage biomarkers 
(PU.1, Pparg, Abcg] mRNA) are reduced in 4-month-old Csf2rb~'~ compared 
to age-matched WT mice. f, Progressive increase in BAL turbidity in Csf2rb~/~ 
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mice but not age-matched WT mice (linear regression: Csf2rb"'~, 

slope = 0.1271 + 0.16 (r, 0.311); WT, slope = 0.031 + 0.005). g, Progressive 
increase in BAL fluid GM-CSF level in Csf2rb~'~ mice but not age-matched 
WT mice (linear regression: Csf2rb '~, slope = 0.89 + 0.016 (7°, 0.249); 

WT, slope = 0). h, GM-CSF bioactivity in BAL fluid from 10-month-old 
Csf2rb-'~ or WT mice (or 1ngml' murine GM-CSF) measured in the 
presence of anti-GM-CSF antibody (GM-CSF Ab) or isotype control (Control 
Ab) using the GM-CSF-stimulated STAT5 phosphorylation index (STAT5-PI) 
assay. Data are mean = s.e.m. of n = 7 mice per group (c-e), n = 4 (h) or 
symbols representing individual WT (n = 38) or Csf2rb~/~ (n = 84) mice 
and regression fit + 95% CI (f-g). *P < 0.05, **P < 0.01, ***P< 0.001. ns, 
not significant. 
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Extended Data Figure 2 | Characterization of BMDMs before PMT. 

a, b, Photomicrographs of WT BMDMs before transplantation phase-contrast 
(a) or DiffQuick staining (b) (representative of n = 7 BMDM preparations). 
Scale bar, 20 jum. c, Flow cytometry evaluation of cell-surface phenotypic 
markers on WT BMDMs before PMT. d, Photographs of methylcellulose 
cultures of Lin” cells (5,000 per dish) from bone marrow (left) and BMDMs 
(50,000 per dish) prepared as described in the Methods (right) and typical 
colonies (below) (representative n = 3 per condition). e, Colony counts of 
BFU-E, CFU-GEMM and CFU-GM showing that BMDMs contained 
<0.005% CFU-GM and no BFU-E or CFU-GEMM progenitors, 
corresponding to 93 CFU-GM per dose of BMDMs administered (n = 3 


ARTICLE 


‘ vi" Ne Ad 


naw 
eue 


Wi Specific antibody [J Control antibody 


@ 250 


BB srue 


>200 >200 2.3 + 0.3 CFU-GM 


= 200 colonies per 50,000 
2 OO cru-cemm BMDMs plated; 
ray No BFU-E or 
5 150 [J cru-cm CFU-GEMM 
re} 
© colonies 
i= 
= 100 
8 
= zs bp ep os 
3 ee i ¢ 
e | o 
50 3 8 ¢€ ¢@ 
os so uw ws 
So 6. & 
=. = = =. 
) 
‘135! 35) 10 20 60 45 § 10 20 50 
Lin” cells BMDMs x 1000 seeded/dish 


x 1000 seeded/dish 


ns 


ND 
WT BMDMs KO BMDMs. 

oO Before surfactant exposure (ND) 

oO Immediately after surfactant exposure 


ee 24 hours after surfactant exposure 


determinations per condition). f, g, Evaluation of surfactant clearance capacity. 
Representative photomicrographs of BMDMs from WT (left) or Csf2rb/~ 
(right) were examined before (top) or immediately after incubation with 
surfactant for 24h (middle), or after exposure, removal of extracellular 
surfactant and culture for 24h in the absence of surfactant (lower) after 
oil-red-O staining (representative of n = 3 per condition). Scale bar, 20 jum. 
g, Measurement of surfactant clearance by BMDMs after exposure as just 
described (f) and quantified using a visual grading scale (the oil-red-O staining 
index) to measure the degree of staining. Bars represent the mean + s.e.m. 

(n = 3 per condition) of oil-red-O staining score for 10 high-power fields for 
each group. ND, not detected; ns, not significant; ***P < 0.001. 
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Extended Data Figure 3 | Efficacy of PMT in Csf2rb"'~ mice and 
characterization of macrophages after PMT. a, Detection of CD131 (top) or 
actin (bottom) in BAL cells by western blotting 1 year after PMT (each lane 
represents one mouse of 6 per group). b, Representative cytology of BAL 
obtained 1 year after PMT after staining with PAS or oil red O (ORO) (6 mice 
per group). Scale bar, 25 tum. Oil-red-O positive cells were seen rarely in WT 
mice and occasionally in PMT-treated Csf2rb~'~ mice (insets). Cytological 
abnormalities in BAL from untreated Csf2rb~'~ mice including large, ‘foamy’, 
PAS- and oil-red-O-stained alveolar macrophages and PAS-stained cellular 
debris, were corrected by PMT. c, Representative photomicrographs of 
PAS-stained whole-mount lung sections 1 year after PMT. Note that some 
residual disease remained at 1 year (original magnification, <1). d, GFP™ cells 
in BAL cells from WT or Csf2rb~/~ mice 2 months after PMT of Lys-M°" 
BMDMs (representative of n = 3 (WT) orn =6 (Csf2rb-' ~) mice) (original 
magnfication, X20). e, Macrophage replication after PMT. Csf2rb~'~ mice 
received Lys-M“"? BMDMs by PMT and paraffin-embedded lung was 
immunostained for Ki67 1 month or 1 year later. Scale bar, 50 |1m; inset, 10 um. 
f, Ki67 staining of BAL cells from untreated WT mice (e). Inset shows positive 
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(left) or negative (right) staining. Scale bar, 50 jum; inset 10 um. Graph 
shows the per cent Ki67* BAL cells in age-matched WT mice (n = 5). 

g, Representative immunofluorescence photomicrographs of frozen lung 
sections 1 year after PMT of Lys-M“* into Csf2rb~'~ mice identifying GEP* 
cells (top), Ki67* cells (middle) and GFP* Ki67~ (replicating, PMT-derived) 
cells (bottom) (representative of n = 3 mice). Scale bar, 20 |um; inset scale bar, 
10 Lm. Quantitative summary data are shown in Fig. 2c. h, Localization of 
macrophages within the lungs 1 year after PMT of Lys-M“” BMDMs into 
Csf2rb-‘~ mice and visualization in frozen lung sections after CD68 
immunostaining, DAPI counter staining, and fluorescence microscopy to 
detect CD68* GFP* cells (that is, PMT-derived macrophages) or 

CD68* GFP” cells (that is, non-PMT-derived endogenous macrophages). 
Graph shows quantitative data for n = 6 mice. i, Localization of macrophages in 
these same mice (h) by detecting GFP by immunohistochemical staining of 
paraffin-embedded lung sections using light microscopy to eliminate potential 
interference from autofluorescence (representative of n = 6 mice). Quantitative 
summary data are shown in Fig. 3b. 
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Extended Data Figure 4 | Tissue distribution and characterization of 
transplanted cells 1 year after PMT. a-d, Two-month-old Csf2rb-‘~ mice 
(4 per group) received one PMT of Lys-M“"? BMDMs. Twelve months later, 
untreated, age-matched WT Lys-M“? or Csf2rb~'~ mice and PMT-treated 
Csf2rb~'~ mice were evaluated using flow cytometry to detect GFP* cells in the 
indicated organs. Representative data (a) and the percentage of GFP* cells 

in the gated region are shown (b). Similar results were observed in Csf2rb/~ 
mice 2 months after PMT of Lys-M“" BMDMs except the percentage of GFP* 
BAL lung cells was not quantified (not shown). c, Detection of Lys- M7? 
PMT cells by PCR. PCR of genomic DNA from BAL cells (Lung), white blood 
cells (Blood), bone marrow (BM) cells and splenocytes (Spleen) 1 month or 

1 year after Lys-M°*? BMDM PMT was performed to detect EGFP and 
Lysozyme M gene. BAL cells (Lung) from WT and Lys-M“" were shown as 
negative and positive control for EGFP. EGFP was only detected in lung. 

d, Vector copy number analysis after gene-corrected BMDM PMT. 
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Quantitative PCR with vector-specific primers (R-U5) was performed using 
genomic DNA from BAL cells (Lung), white blood cells (Blood), bone marrow 
(BM) cells and splenocytes (Spleen) obtained 1 year after PMT of gene- 
corrected macrophages. Note that the viral vector was only detected in lung. 
e-h, CD45.2* Csf2rb~/~ mice received one PMT of CD45.1* BMDMs from 
congenic WT mice (e) and 1 year later, untreated, age-matched WT (CD45.1*) 
or Csf2rb~/~ (CD45.2*) mice and PMT-treated Csf2rb~'~ mice were 
evaluated by flow cytometry to detect CD45.1* cells in the indicated organs. 
Representative data (f) and the percentage of CD45.1 cells in the gated regions 
are shown (g). Phenotypic characterization of PMT-derived (CD45.1*) cells 
(as shown in the gated region (f)). Results are similar to those for PMT of 
Lys-M“? BMDMs (Fig. 3d). Numeric data are mean + s.e.m. of n = 4 mice per 
group (b, d) or n = 5 mice per group (g). ND, not detected. *P < 0.05. ns, 
not significant. 
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identifying pathways disrupted in Csf2rb ‘~ mice and restored by PMT. 
Data show the coordinate increases (red) or decreases (blue) in expression of 
genes in all gene sets significant at or below a false detection rate of 10% 
calculated by the Gene Set Test with correction for multiple testing. d, Heat 
maps showing differentially expressed genes in multiple KEGG pathways 
including PPARy-regulated genes, glycophospholipid metabolism, peroxisome 
function apoptosis, cell cycle control, and immune host defence. Genes with 
increased or decreased transcript levels are shown by red and blue colours, 
respectively. e, Confirmation by RT-PCR for selected genes important in lipid 
metabolism, using independent samples. Data are mean + s.e.m. (6 mice per 
group). *P<0.05. 


Extended Data Figure 5 | Global gene expression analysis of alveolar 
macrophages from age-matched WT, Csf2rb~'~ and Csf2rb~'~ mice 

1 year after PMT of WT BMDMs. a, Expression of Spil (PU.1) and Pparg 
(PPARy) were confirmed by qRT-PCR using independent samples (6 mice 
per group). b, Venn diagrams showing numbers of genes whose expression 
was altered in alveolar macrophages from Csf2rb ‘~ compared to WT mice 
(WT—>KO) or PMT-treated compared to untreated Csf2rb_'~ mice 
(KO—KO+ PMT). Only genes with statistically significant changes (false 
detection rate <10%) of at least twofold were marked as increased (up arrows) 
or decreased (down arrows). The numbers of genes for which expression was 
disrupted in Csf2rb~'~ mice and normalized by PMT (or unchanged in both 
comparisons) is shown in the overlap regions. c, Gene ontology analysis 
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Extended Data Figure 6 | Effects of PMT of gene-corrected macrophages 

No LV + GFP-LV -R- on hPAP. a, Macrophages derived from Csf2rb-'~ LSK cells transduced with 

/ GM-R-LV or GFP-LV, or from non-transduced WT LSK cells (indicated) 

were examined by light microscopy after DiffQuick staining (top), or by 

a immunofluorescence microscopy after staining with anti-CD131 (GM-CSF- 
an | my RE R-B) and DAPI (upper middle), DAPI alone (lower middle), or anti-CD68 and 

ods: DAPI (bottom). Images are representative of three experiments per condition. 

et a *,! b, Evaluation of GM-CSF receptor signalling in the indicated cells (before 


PMT) by measurement of GM-CSF-stimulated STATS phosphorylation by 
flow cytometry. Representative of n = 3 experiments per condition. 
Quantitative summary data are shown in Fig. 5b. c, Western blotting to detect 
GM-CSF receptor-f (CD131) (top) or actin (bottom, as a loading control) 
in BAL cells from age-matched Csf2rb ‘~ mice 2 months after PMT as 
indicated (each lane represents one mouse of n = 10, 8, 10 per group, 
respectively). d, Appearance of BAL from age-matched Csf2rb‘~ mice 
2 months after PMT as indicated (representative of n = 10, 8, 10 per group, 
respectively). e, f, One year after PMT of GM-R-LV transduced Csf2rb‘~ LSK 
r cell-derived macrophages in Csf2rb-'~ mice, GFP* cells were identified (e) 
and evaluated for cell surface markers by flow cytometry (f) (representative of 
n=7 mice). 
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Extended Data Table 1 | Oligonucleotide primers used to quantify mRNA transcripts by qRT-PCR and detection of PMT-derived cellular DNA 
by PCR 


TaqMan® Primers 

Gene name _ Accession no. Product (bp) Catalogue no. 
Spi1(PU.1) NM_011355.1 91 Mm00488142_m1 
Pparg NM_001127330.1 101 Mm01184322_m1 
Abcg1 NM_009593.2 65 Mm00437390_m1 
Csf1 NM_001113529.1 70 Mm00432686_m1 
Csf2 NM_009969.4 125 Mm01290062_m1 
Csf2rb NM_007780.4 60 Mm00655745_m1 
Nr1h3 (LXR) | NM_001177730.1 | 57 Mm00443451_m1 
Olr1 NM_138648.2 64 Mm00454586_m1 
Lepr NM_001122899.1 97 Mm00440181_m1 
Fabp1 NM_007980.2 116 Mm00433188_m1 
Lipf NM_026334.3 87 Mm00471152_m1 
Abcat1 NM_013454.3 55 Mm00442646_m1 
Apoe (Apo E) + NM_009696.3 64 Mm01307192_m1 
Apoc2 NM_009695.3 60 Mm00437571_m1 
Pla2g7 NM_013737.5 Mm00479105_m1 
Gapdh NM_008084.2 4352932E 

18S RNA X03205.1 4310893E 
Custom Primers 

Gene name Accession no. Product (bp) | Sequence (5’ > 3’) 


Lys-MerP NA - transgene 680 aag ctg ttg gga aag gag gg 
gtc gcc gat ggg gat gtt ct 


Lysozyme-M M21049 220 aag ctg tig gga aag gag gg 
teg gcc agg ctg act cca ta 
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Extended Data Table 2 | Effect of the number of macrophages transplanted on the efficacy of PMT therapy of hPAP in Csf2rb~’~ mice 


KO + PMT (Macrophages/dose x 10°) 
Parameter WT a 


Turbidity 0.0553 1.96 + 0.765 ¢ 0.685 + 0.38 ¢ 0.536 + 
O.D.600nm  0.023-0.21 | 1.85-2.74 | 0,599-0.823 | 0.472-0.997 | 0.283-0.685  0.301-0.732 
SP-D 75.9 2105 + 1475+ 14144 911+ 1299+ 
ug/ml BAL 51-84 | 1739-2396 1367-2034 656-1951 | 660-1179 | 762-1634 
GM-CSF 40.8 + 28.1 ¢ 17.2t 13.84 14.8+ 
pg/ml BAL 21.4-54.5 | 15.5-37.8 | 12.2-20.5  6.97-17.5 10.4-18.0 
M-CSF 45.0 t 30.4§ 25.4§ 21.7§ 29.3§ 
pg/ml BAL 32.3-81.9  14.2-36.0 | 14.8-36.3 | 20.1-42.7 23.5-40.5 
MCP-1 0.88 135 + 72.14 57.5t 49.0¢ 64.4¢ 
pg/ml BAL 0-25.1 123-163 36.9-121 | 45.9-80.7 | 26.1-63.0 28.2-109 
Csf2rb mRNA 1.05 Ot 0.108 ¢ 0.1674 0.265 t 0.4474 
AU. 0.86-1.08 0-0 0.085-0.213  0.095-0.82  0.097-1.46  0.197-0.88 
Spit mRNA 1.02 0.306 + 0.468 ¢ 0.4744 0.4754 0.480 + 
AU. 0.78-1.13 | 0.289-0.393 | 0.27-0.470 | 0.351-0.707 | 0.367-0.803 | 0.446-0.595 
PpargmRNA | 0.929 0.052 + 0.241 ¢ 0.295 + 0.516 + 0.360 + 
AU. 0.902-1.13  0.0-0.106 | 0.196-0.362 | 0.267-0.607 | 0.327-0.923  0.318-0.603 
Abcgi mRNA 0.933 0.08 + 0.148 ¢ 0.232+¢ 0.1794 0.220+ 
AU. 0.833-1.12 | 0.07-0.153 | 0.135-0.183 | 0.134-0.327 | 0.106-0.455 | 0.173-0.229 


AU., arbitrary units; BAL, bronchoalveolar lavage; hPAP, hereditary pulmonary alveolar proteinosis; KO, Csf2rb knockout mice; 0.D., optical density; PMT, pulmonary macrophage transplantation; WT, wild type. 
* Mice received the indicated numbers of WT BMDMs once by PMT. Three months later, BAL fluid and cells were obtained from PMT-treated knockout mice, and age-matched, untreated WT or knockout mice (7 
mice per group for each condition evaluated). BAL turbidity, the concentration of SP-D, GM-CSF, M-CSF and MCP-1 in BAL fluid, and the relative abundance of Csf2rb, Spil (PU.1), Pparg and Abcg1 mRNA 
transcripts in alveolar macrophages were measured as described in Methods. All data are presented as median (interquartile range (IQR)) and between-group comparisons were done using non-parametric 
methods for consistency since results for some groups were either undetectable, not normally distributed or of unequal variance. 

+ Result is significantly different compared to untreated WT mice (Mann-Whitney rank sum test, P< 0.001). 

{Result is significantly different compared to untreated KO mice (Kruskal-Wallis One Way Analysis of Variance on Ranks with Pairwise comparison to untreated KO mice by the Student-Neuman-Keuls method, 
P<0.05). 

§ Result is not significantly different compared to untreated KO mice (Kruskal-Wallis One Way Analysis of Variance on Ranks, P=0.133). 
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Extended Data Table 3 | Comparison of the effects of single versus repeated macrophage administrations on the efficacy of PMT therapy of 


hPAP in Csf2rb~’~ mice 
Number of PMT Administrations 
Parameter | One | Four | P-value t 


Turbidity, O.D.600 nm | 2.014 (1.77-2.53) 1.68 (1.49-3.29) 0.486 
SP-D, pg/ml BAL 816 (750-996) 772 (653-796) 0.486 
GM-CSF, pg/ml BAL 18.3 (15.3-35.6) 14.3 (13.8-27.8) 0.20 


M-CSF, pg/ml BAL 55.5 (50.8-65.6) 29.7 (28.1-49.8) 0.114 
MCP-1, pg/ml BAL 88.3 (74.0-118) 53.6 (35.2-78.8) | 0.114 
Spit mRNA, A.U. 0.377 (0.284-0.545) | 0.322 (0.268-0.362) | 0.686 
Pparg mRNA, A.U. 0.201 (0.122-0.474) | 0.234 (0.169-0.303) | 1.0 

Abcg1 mRNA, A.U. 0.116 (0.098-0.236) | 0.117 (0.083-0.131) | 0.886 


A.U., arbitrary units; BAL, bronchoalveolar lavage; HPAP, hereditary pulmonary alveolar proteinosis; KO, Csf2rb knockout mice; O.D., optical density; PMT, pulmonary macrophage transplantation. 

* Knockout mice received 2 x 10° macrophages by PMT either once or as four monthly doses. Four months after the initial PMT administration in both groups, BAL fluid and cells were obtained from all mice (6 per 
group for each condition evaluated). BAL turbidity, the concentration of SP-D, GM-CSF, M-CSF and MCP-1 in BAL fluid, and the relative abundance of Spil (PU.1), Pparg and Abcg1 mRNA transcripts in alveolar 
macrophages were measured as described in Methods. All data are presented as median (interquartile range (IQR)) and between-group comparisons were done using non-parametric methods for consistency 
since results for some groups were either not normally distributed or of unequal variance. 

+ Mann-Whitney Rank Sum Test. P-values of = 0.05 was considered to indicate statistical significance. 
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Extended Data Table 4 | Effect of PMT of WT or gene-corrected macrophages on haematological indices and lung proinflammatory cytokine 
levels 


Blood Safety Evaluation - PMT Using Wild-Type Macrophages 


Hematologic Parameter 
Hemoglobin, g/dL 
Hematocrit, % 

WBC, x10%/ul 
Neutrophils, x10%/ul 
Lymphocytes, x10%/ul 
Monocytes, x107/ul 
Eosinophils , x10%/ul 
Basophils, x10%/ul 
Platelets, x10°/ul 


Blood Safety Evaluation - PMT Using Gene-Corrected Macrophages 


Hematologic Parameter 
Hemoglobin, g/dL 
Hematocrit, % 
WBC, x10°/ul 


Neutrophils, x107/ul 
Lymphocytes, x10%/ul 
Monocytes, x10%/ul 
Eosinophils , x10°/ul 
Basophils, x105/ul 
Platelets, x10°/ul 
Lung Safety Evaluation - PMT Using WT Macrophages 


Cytokine in BAL Fluid 
IL-6, pg/ml 
IL-1B, pg/ml 
TNFa, pg/ml 


Lung Safety Evaluation - PMT Using Gene Corrected Macrophages 


Cytokine in BAL Fluid 
IL-6, pg/ml 
IL-1B, pg/ml 
TNFa, pg/ml 


Normal range 
11.0-15.1 
35.1 - 45.4 
1.8-10.7 
0.1-2.4 
0.9-9.3 
0.0-0.4 
0.0-0.2 

0.0 -0.2 

592 - 2972 
Normal range 
11.0 -15.1 
35.1-45.4 
1.8-10.7 
0.1-2.4 
0.9-9.3 
0.0-0.4 

0.0 -0.2 

0.0 -0.2 

592 — 2972 


WT (n=6) 


12.7 (12.2 — 13.2) 
46.5 (44.5 — 47.3) 
2.53 (1.53 — 5.35) 
0.29 (0.107 — 0.67) 
2.18 (1.06 — 4.26) 
0.15 (0.11 — 0.24) 
0.01 (0.01 — 0.04) 
0.0 (0.0 - 0.01) 
558 (463 — 735) 


WT (n=5) 

10.2 (8.5 — 10.8) 
37.5 (33.3 — 40.4) 
1.90 (1.30 - 10.7) 
1.31 (0.67 - 8.01) 
0.65 (0.26 — 1.38) 
0.25 (0.15 — 0.90) 
0.01 (0.01 — 0.43) 
0 (0.0 - 0.07) 
619 (441 — 1478) 


WT (n=6) 

3.01 (1.35 — 3.89) 
0 (0.0 - 0.0) 

0.52 (0.0 - 0.94) 


WT (n=5) 

0.30 (0.0 — 2.43) 
0 (0.0 - 0.29) 
2.45 (0.46 — 2.76) 


KO (n=6) 

14.5 (13.9— 15.1) ¢ 
53.0 (50.2 — 56.3) 
5.08 (3.93 — 6.34) t 
0.89 (0.125 — 2.63) ¢ 
3.24 (1.60 — 5.79) t 
0.155 (0.12 — 0.32) t 
0.02 (0.01 — 0.06) t 
0.01 (0.01 - 0.01)t 
1035 (830 — 1145) t 


KO (n=5) 

16.9 (13.8 — 18.5) t 
68.8 (55.4 — 80.3) ¢ 
6.48 (5.11 — 8.57) t 
2.61 (1.71 — 2.85) t 
3.39 (2.70 -— 5.15) ¢ 
0.44 (0.29 - 0.87) t 
0.02 (0.02 — 0.07) t 
0.01 (0.00 — 0.04) t 
1229 (1094 — 1460) ¢ 


164 (43.6 — 364) t 
3.49 (2.42 — 9.52) t 
12.4 (7.22-17.9)¢ 


KO (n=5) 

16.2 (14.1 — 62.1) + 
2.01 (0.47 - 8.1) ¢ 
6.14 (3.68 — 7.99) ¢ 


KO + PMT (n=6) 
12.9 (12.2— 13.4) J 
47.2 (43.0 — 50.5) J] 
3.27 (2.80 — 4.10) J 
0.670 (0.102 — 1.41) § 
2.37 (1.76 — 2.94) § 
0.19 (0.15 — 0.25) § 
0.015 (0.01 - 0.02) § 

| 0.0 (0.0 - 0.003) 7 
993 (838 — 1031) § 


KO + PMT (n=7) 

12.2 (11.1-12.9) 
49.4 (48.8 — 55.9) J 
5.08 (2.32 — 5.88) § 
1.49 (0.96 — 2.89) § 
2.03 (1.15 — 3.33) 
0.25 (0.15 —0.40)§ 
0.08 (0.01 — 0.16)§ 
0.01 (0.00 — 0.03) § 
1381 (872 — 1614)§ 


KO + PMT (n=6) 
11.02 (4.87 — 60.1) ] 
0.92 (0.04 — 1.02)] 
2.75 (1.08 — 3.17) 


KO + PMT (n=7) 

29.5 (13.7 — 42.5) § 
1.65 (0.94 — 3.08) § 
3.06 (2.45 — 7.98) § 


BAL, bronchoalveolar lavage; hPAP, hereditary pulmonary alveolar proteinosis; KO, Csf2rb knockout mice; O.D., optical density; PMT, pulmonary macrophage transplantation; WT, wild type. 

* Knockout mice received WT or Csf2rb gene-corrected knockout macrophages (2 x 10° cells per mouse) once by PMT and 12 months later, blood and BAL fluid were obtained from PMT-treated knockout 
mice, or age-matched, untreated knockout or WT mice and evaluated as described in Methods. Number of mice per group is indicated. All data are presented as median (interquartile range (IQR)) and between- 
group comparisons were done using non-parametric methods (Mann-Whitney rank sum test) for consistency since results for some groups were either not normally distributed or of unequal variance. 

P values = 0.05 were considered to be significant. 

+ Result is not significantly different compared to WT mice. 

{Result is significantly different compared to WT mice. 

§ Result is not significantly different compared to untreated knockout mice. 

Result is significantly different compared to untreated knockout mice. 
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Structure and immune recognition of 
trimeric pre-fusion HIV-1 Env 


Marie Pancera’, Tongqing Zhou!, Aliaksandr Druz!, Ivelin S. Georgiev’, Cinque Soto!, Jason Gorman!, Jinghe Huang’, 
Priyamvada Acharya!, Gwo-Yu Chuang’, Gilad Ofek!, Guillaume B. E. Stewart-Jones!, Jonathan Stuckey', Robert T. Bailer’, 

M. Gordon Joyce’, Mark K. Louder', Nancy Tumba®, Yongping Yang’, Baoshan Zhang’, Myron S. Cohen“, Barton F. Haynes’, 
John R. Mascola’, Lynn Morris®®’, James B. Munro’, Scott C. Blanchard’, Walther Mothes®, Mark Connors & Peter D. Kwong! 


The human immunodeficiency virus type 1 (HIV-1) envelope (Env) spike, comprising three gp120 and three gp41 sub- 
units, is a conformational machine that facilitates HIV-1 entry by rearranging from a mature unliganded state, through 
receptor-bound intermediates, to a post—fusion state. As the sole viral antigen on the HIV-1 virion surface, Env is both 
the target of neutralizing antibodies and a focus of vaccine efforts. Here we report the structure at 3.5 A resolution for an 
HIV-1 Env trimer captured in a mature closed state by antibodies PGT122 and 35022. This structure reveals the pre- 
fusion conformation of gp41, indicates rearrangements needed for fusion activation, and defines parameters of immune 
evasion and immune recognition. Pre-fusion gp41 encircles amino- and carboxy-terminal strands of gp120 with four 
helices that form a membrane-proximal collar, fastened by insertion of a fusion peptide-proximal methionine into a 
gp41-tryptophan clasp. Spike rearrangements required for entry involve opening the clasp and expelling the termini. 
N-linked glycosylation and sequence-variable regions cover the pre-fusion closed spike; we used chronic cohorts to 
map the prevalence and location of effective HIV-1-neutralizing responses, which were distinguished by their recog- 


nition of N-linked glycan and tolerance for epitope-sequence variation. 


Over the last 50 years, more than 70 million people have been infected 
or killed by the human immunodeficiency virus type 1 (HIV-1)'. A 
dominant contributing factor has been the molecular trickery of the 
HIV-1 envelope (Env) spike, a type I fusion machine that facilitates virus 
entry into cells by interacting with host cellular receptors and fusing 
membranes of virus and host cell (reviewed in ref. 2). Despite its exposed 
position on the viral membrane and the generation of narrow-breadth 
neutralizing antibody responses throughout the course of HIV-1 infec- 
tion, the evolving HIV-1 Env spike successfully evades most antibody- 
mediated neutralization’. This evasion is, to a large degree, responsible 
for the difficulty in developing an effective HIV-1 vaccine. 

Initially synthesized as a gp160 precursor, which is cleaved into gp120 
and gp41 subunits, the trimeric HIV-1 Env spike displays unusual post- 
translational processing, including the addition of 25-30 N-linked gly- 
cans per gp120-gp41 protomer’, tyrosine sulphation’, and slow signal 
peptide cleavage®. Env rearranges from a pre-fusion mature closed state 
that evades antibody recognition through intermediate open states that 
bind to receptors, CD4 and co-receptor (either CCR5 or CXCR4), toa 
post-fusion state (reviewed in ref. 2). Over the last 20 years, substantial 
atomic-level detail has been obtained on these states, including struc- 
tures of receptor-bound gp120’, post-fusion gp41*°, and most recently 
the trimeric arrangement of pre-fusion gp120 along with two gp41 he- 
lices, one of which was aligned in sequence’*"’. The pre-fusion structure 
of gp41 has, however, resisted atomic-level analysis. Because the primary 
structural rearrangement driving membrane fusion is the gp41 transi- 
tion from pre-fusion to post-fusion conformations, the lack ofa pre-fusion 
gp41 structure has stymied attempts to provide a coherent picture of 


the conformational rearrangements the spike undergoes to facilitate 
entry. 

Here we use neutralizing antibodies PGT122” and 35022" to cap- 
ture the HIV-1 spike in a pre-fusion mature closed state. We obtained 
crystals of the antigen-binding fragments (Fabs) of these two antibodies 
in complex with a soluble, cleaved Env trimer construct (BG505 SOSIP. 
664)'*""¢ and determined its atomic-level structure. Examination of this 
structure in the context of previously determined gp120 and gp41 struc- 
tures affords a mechanistic understanding of the conformational transi- 
tions the spike undergoes to facilitate virus entry. We delineated aggregate 
parameters of glycan shielding and genetic variation and used infected 
donor serum to determine where the immune system succeeds in recog- 
nizing the HIV-1 spike. Analysis of the pre-fusion HIV-1 Env structure 
and its conformational rearrangements, combined with an understand- 
ing of its evasion from and vulnerabilities to the immune system, reveal 
similarities to other type I viral fusion machines as well as features of 
recognition by the human immune system unique to this important 
vaccine target. 


Structure determination and overall structure 

Atomic-level information for virtually all of the HIV-1 Env ectodomain 
in its pre-fusion conformation has been obtained from antibody-bound 
complexes (Extended Data Fig. 1a). The recently determined crystal 
structure’? of a soluble cleaved HIV-1 Env based on the BG505 SOSIP. 
664 construct was no exception; in particular, while an artificial disul- 
phide and other modifications of the SOSIP.664 construct were critical 
to production of a homogeneous, soluble, cleaved trimer’, antibody 
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PGT122 appeared to facilitate crystallization’®. Diffraction from crys- 
tals of the PGT122 complex, however, extended to only 4.7 A resolu- 
tion, hampering the trace of non-helical regions of gp41 as well as the 
placement and registry of side chains’. To obtain improved crystals, 
we explored the addition of antibody 35022, which recognizes a gp120- 
gp4l1 epitope’*. Addition of 35022 to PGT122-bound viral spike in the 
membrane-bound virion context showed single-molecule fluorescence 
resonance energy transfer (smFRET) responses that closely resembled 
those of the mature native unliganded spike (Extended Data Fig. 1b)"*. 
In the context of crystallization, addition of 35022 to the PGT122- 
BG505 SOSIP.664 complex led to ternary complex crystals in space 
group P63. Although diffraction was anisotropic, we succeeded in col- 
lecting ~3.5 A data froma single crystal (Extended Data Table 1). Struc- 
ture solution by molecular replacement with free structures of Fab 
PGT122”, Fab 35022" and gp120” revealed a double antibody-bound 
gp120-gp41 protomer to occupy the asymmetric unit and led to an Ryork/ 
Réree of 21.35%/24.80%. 

Overall, the HIV-1 spike forms a three-blade propeller, capped at 
its membrane-distal apex by antibody PGT 122 and at the membrane- 
proximal end by antibody 35022 (Fig. 1 and Extended Data Fig. 2a, b). 
Protomer interactions occur through assembled variable regions, V1, 
V2 and V3, which comprise the trimer association domain”’ at the 
membrane-distal portion of the spike, and also through gp41, primarily 
between helical interactions around the trimer axis'"’. No trimeric in- 
teractions are contributed by the gp120 core; indeed, a cleft or opening 
is found under the trimer association domains along the threefold axis 
where such associations might occur. Trimeric pre-fusion gp41 forms 
a platform through which the gp120 termini extend towards the viral 
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Figure 1 | Structure of a pre-fusion HIV-1 Env trimer bound by PGT122 
and 35022 antibodies. One protomer and associated Fabs are shown in 
ribbon and stick representation, a second protomer in surface representation, 
and the third protomer in grey. Residues comprising the refined HIV-1 Env 
model are displayed on the bar, with beginning and final ordered residue of 
each segment labelled; vertical lines demark termini of the mature ectodomain 
subunits; unmodelled regions, which show residues disordered or not present 
in the BG505 SOSIP.664 construct, as well as glycans, which are disordered 
or not present, are shown in grey. 35022 and PGT122 interactions with the 
HIV-1 Env trimer are shown in Extended Data Fig. 9a-f, and bound versus 
unbound Fabs are shown in Extended Data Fig. 9g. 
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membrane. Unusually slow signal peptide cleavage®, which keeps the 
N terminus of gp120 proximal to the membrane, may facilitate folding 
of pre-fusion HIV-1 Env. 


Pre-fusion structure of gp41 

Pre-fusion gp41 wraps its hydrophobic core around extended N and C 
termini-strands of gp120 (Fig. 2a). It forms a four-helix collar compris- 
ing helices 06 (Met 530,,4:-Asn 543,541), &7 (Gly 572,,4;—He 595,541); 
a8 (Leu 619g541—-Trp 6239541), and «9 (Trp 628,,41-Asp 664,541) (the 
numbering of pre-fusion gp41 helices and strands continues the no- 
menclature established for the gp120 subunit, which ends with helix 
«5 and strand {26; for clarity, the molecule is named after each residue 
number). The first residue of gp41 visible in electron density corresponds 
to Val518,,4:, in the fusion peptide. An extended stretch connects 
to Leu 523,,41, which interacts hydrophobically with Trp 45,,129 and 
Tle 84,,,120, both of which are part of the seven-stranded B-sandwich 
around which the gp120 inner domain is organized’***. The main chain 
of gp41 follows gp120 strand B0 away from the trimer axis towards the 
viral membrane until residue Met 530,,41, where the fold reverses itself 
and extends through «6 towards the trimer axis and away from the viral 
membrane. Density between residues 547,,,4; and 569,,,4; is sparse (Ex- 
tended Data Fig. 3a, b), and ultimately connects to helix «7, which forms 
a parallel coiled-coil about the trimer axis. At the end of «7 is the gp41 
cysteine loop (spanned by the Cys 598,,,4:-Cys 604,14; disulphide), whose 
C-terminal residues initiate strand B27 (Leu 602,,4:-Thr 606,,41), which 
forms hydrogen bonds in an anti-parallel fashion with strand [4 (B- 
strand negative 4) from the N terminus of gp120. The intersubunit 


Figure 2 | Pre-fusion structure of gp41. a, gp41 forms a four-helix collar, 
which wraps around extended N and C termini of gp120. Both gp120 (red) and 
gp41 (rainbow from blue to orange) are depicted in ribbon representation, 
with select residues and secondary structure labelled (additional labels are 
shown in Extended Data Fig. 10). The location of the trimer axis is indicated 
with a ‘3’ inside a triangle. The orientation shown here is similar to that of Fig. 1, 
with perpendicular orientations provided in b and c. Zoom insert: the gp41 
collar is clasped by the insertion of Met 530,,4) into a tryptophan sandwich and 
by the alignment of helices %6 and «8. 2F, — F, electron density for clasp 
residues is depicted at 1c. b, gp41 holds the N and C termini of gp120 in its 
hydrophobic core. Colouring and representation are the same as in a, except 
that hydrophobic side chains are shown in stick representation and the 
orientation is rotated 90°, to depict the view from the viral membrane. c, gp41- 
trimer interfaces as viewed from side in ribbon and surface representation. 
Overall, the pre-fusion structure of gp41 and its trimeric arrangement appear 
to have no close structural relatives in the Protein Data Bank (Supplementary 
Table 2). N618, N611 and N637 indicate gp41 residues with attached 
N-linked glycans (see Fig. 1 bar). 
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disulphide (‘SOS’)"* between residues 5015129 and 605,541 welds the C 
terminus of gp120 to the membrane-proximal end of strand [4 (Fig. 2a). 
Upon passing the gp120 termini, gp41 reaches «8, whose C terminus 
aligns spatially with the N terminus of «6. After «8, the «9 helix reverses 
direction, again wrapping past the N and C termini of gp120, before 
extending horizontally along the edge of the spike to reach the gp120 
termini from a neighbouring protomer. 

Topologically, the gp41 subunit completes a single circle around the 
gp120 termini with the insertion of a hydrophobic prong comprising 
the side chain of Met 530,,4: (which is located at the N terminus of «6, 
proximal to the fusion peptide), into a triple-tryptophan clasp formed 
by Trp 623,54: (from the C terminus of 08), Trp 628,41 (from the N 
terminus of #9) and Trp 631,,4) (one turn into 9) (Fig. 2a insert). The 
alignment of helices «6 and «8 provides electrostatic complementarity 
that helps to stabilize the neighbouring methionine-tryptophan clasp. 

Within a single protomer, the buried surface area between gp41 and 
gp120 totals 5,270 A?, including 216 A? from glycan-protein interac- 
tions (Supplementary Table 1). A substantial portion of this is hydro- 
phobic: gp41 essentially wraps its hydrophobic core around the N and 
C termini of f gp 120 (Fig. 2b). Trimer interfaces also bury a large surface 
area (3,140 A” contributed by each protomer, comprising 1,920 A” from 
the ep gp4l interface, 861 A’ from the gp120-gp120 interface and 
360 A” from the gp120-gp41 interface) (Extended Data Fig. 2c-f). Close 
to the trimer axis, these involve helix «7, as well as the N-terminal por- 
tion of the gp41-cysteine loop. Further from the trimer axis, interactions 
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involve «9. Other than interactions of «7, most inter-protomer inter- 
actions are hydrophilic (Fig. 2c). 


Pre-fusion to post-fusion gp41 transition 


To understand the conformational transition from pre-fusion to post- 
fusion gp41, we compared the gp41 pre-fusion structure in our anti- 
body-bound HIV-1 Env trimer with previously determined post-fusion 
structures*”’*”* (Fig. 3). Post-fusion gp41 comprises two helices, HR1 
and HR2 (Fig. 3a); these form a trimeric six-helical bundle, with HR1 
helices arranged as an interior parallel coiled-coil, and exterior HR2 
helices packed in an anti-parallel fashion to bring N-terminal fusion pep- 
tides and C-terminal transmembrane regions into proximity. Distance 
difference analysis*® (Fig. 3b) of pre-fusion and post-fusion structures 
indicated two regions of structural similarity, corresponding to the pre- 
fusion «7 helix aligned with the C-terminal half of the post-fusion HR1 
helix (Fig. 3c, left) and the pre-fusion «9 helix aligned with much of the 
post-fusion HR2 helix (Fig. 3c, right). 

Superposition of pre-fusion «7 and post-fusion HR1 placed residues 
569 gp41-993 gpa within 5 A, with a root-mean- square deviation (r.m.s.d.) 
of 1.35A (Fig. 3c, left). For this superposition to occur, Co.-movements 
of over 80 A are required for the gp41 fusion peptide and «6 helix as well 
as for the C-terminal portion of the «9 helix. Notably, this superposition 
preserves the coiled-coil trimeric interactions of both pre-fusion and 
post-fusion molecules and thus is likely to mimic the natural confor- 
mational transition that occurs during membrane fusion. Meanwhile, 


Figure 3 | Entry rearrangements of HIV-1 Env. 
a, BG505 sequence“ of gp41, with pre-fusion and 
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(FP) is underlined and labelled green. Several post- 
fusion gp41 structures have been determined 
ranging from a minimal, protease-treated, crystal 
structure (residues 556,,41-581g5415 628 gp41- 
661,541; PDB ID 1 AIK*) with 80% sequence 
identity to BG505* to a more complete gp41 
structure (residues 531g,41:-581g5415 624 gp41- 
681,541; PDB ID 2X7R™) and an NMR structure 
that includes the cysteine loop (residues 539, 
665541; PDB ID 2EZO”) of the simian 
immunodeficiency virus (SIV), which shares 48% 
sequence identity with BG505"° and is substantially 
similar to the HIV-1 structures (less than 1 A Cot 
r.m.s.d. between overlapping residues of 1AIK 
and 2EZO). The post-fusion structure used here for 
comparisons was constructed from a chimaera of 
HIV-1-SIV structures (Extended Data Fig. 3c). 
b, Difference distance analysis” of pre-fusion 
BG505 and post-fusion HIV-1-SIV chimaeric 
gp41. Secondary structure is indicated, along with 
missing residues of BG505 (548-568) and of SIV 
(611-614). c, Superposition of post-fusion gp41 
(grey) onto pre-fusion gp41 (rainbow) for «7 (left) 
and «9 (right) pre-fusion helices. d, HIV-1 Env 
entry rearrangements. Electron microscopy 
reconstructions (top row) with gp120 (middle) and 
gp41 (bottom) rearrangements between each 
MEE Motion relative conformational state highlighted with orange lines 
to prior state » rages 
vs depicting movement of each Co between 
conformations. Subunit models are shown in grey 
with modelling parameters and references 
provided in Extended Data Table 2. Antigenic 
recognition of each of these states is shown in 
Extended Data Fig. 5. 
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superposition of pre-fusion «9 and post-fusion HR2 placed residues 
634,,41-664,,41 within 5 A, with an r.m.s.d. of 3.58 A (Fig. 3c, right); 
the substantial alignment of «9 and HR2 helices indicates that the HR2 
helix is preformed in the pre-fusion structure. 


Entry rearrangements of HIV-1 Env 


Biosynthesis of HIV-1 Env starts with an uncleaved gp160 trimer. After 
cleavage, the spike condenses into the pre-fusion mature closed struc- 
ture described here. In the gp120 inner domain, helix a1 is formed, and 
a parallel strand exists between strands B3 and B21; in gp41, we observe 
helix «7 to begin around residue 571,,,4). A partially open electron micro- 
scopy structure” has been reported at 6 A, in which the trimer associa- 
tion domains appear to be displaced from the trimeric axis, and helical 
density suggests helix 7 to start several turns earlier; we modelled these 
rearrangements with a rigid body motion of 6 degrees for gp120 and the 
conversion of ~ 15 residues of helix «6 and connecting stretch into helix 
a7, which extends ~20 A towards the target cell membrane (Fig. 3d, 
middle panel; Extended Data Table 2). 

The CD4-bound state has been visualized by a number of electron 
microscopy reconstructions*” and atomic-level structures”. In this 
state, V1V2 separates from V3: V3 points towards the target cell*®, and 
the bridging sheet’ assembles with strand B2 forming antiparallel hy- 
drogen bonds with B21 (as opposed to the parallel 83-221 interaction 
of the pre-fusion mature closed state; notably, the only parallel B-strand 
in the respiratory syncytial virus (RSV) F glycoprotein pre-fusion struc- 
ture also changes conformation in RSV F pre- to post-fusion transition*’). 
With layer 1 of the inner domain”, helix «0 forms, and Gln428,,129 and 
strand B21 invert; in layer 2, inner domain rearrangements include the 
swapping of distinct perpendicular interactions of Trp 112.5129 and 
Trp 42795120 (Extended Data Fig. 4). CD4 binding allows HR2-peptide 
analogues (such as C34) to bind*’, and we can model helix «7 starting 
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as early as 55444) with Met 530,,4; still in its membrane-proximal tryp- 
tophan clasp, as expected because 35022 binds the CD4-bound SOSIP. 
664 (Extended Data Figs 3d, e and 5c, e). We envision that Env-CCR5 
interactions” bring the CD4-bound state close to the target cell mem- 
brane, where ‘disassembling «6/assembling «7 helices’ coupled to re- 
lease of the Met 530,,4; prong from its tryptophan clasp ultimately 
amasses the gp41 fusion peptide(s) (Fig. 3d, second panel from right, 
Extended Data Fig. 3f). 

At this receptor-bound stage, it is easy to imagine the fusion peptide 
penetrating the target cell membrane, while strand B27 of the gp41- 
cysteine loop remains hydrogen-bonded to the gp120 termini (and the 
C terminus of the gp41 ectodomain remains in the viral membrane). 
Rearrangement of gp41 to its post-fusion conformation may be triggered 
by gp120 shedding™, with expulsion of the gp120 termini tugging on 
the gp41-cysteine loop and destabilizing pre-fusion gp41. 


HIV-1 rearrangements and other type I fusion machines 


To determine whether the distinct elements we observed in pre-fusion 
gp41 were preserved elsewhere, we examined pre-fusion and post-fusion 
states of other type I fusion machines from influenza virus**”® (a mem- 
ber of the Orthomyxoviridae family of viruses), RSV*"*’ (Paramyxovi- 
ridae), and Ebola virus**” (Filoviridae) (Fig. 4a). In all cases, a helix was 
observed in the gp41 pre-fusion equivalents, which corresponds in se- 
quence to the C-terminal portion of the helix that in the post-fusion 
conformation comprises the interior coiled-coil characteristic of type I 
fusion machines*” (Fig. 4b). With pre-fusion machines from HIV-1, 
influenza and Ebola viruses, the nascent pre-fusion helix adopts a coiled- 
coil; with RSV, a coiled-coil assembles immediately N-terminal to the na- 
scent post-fusion helix. Despite marked differences in gp120-equivalents, 
similarity was observed in the overall topology of subunit interactions. 
Notably, all of the gp41-equivalents wrapped hydrophobic residues 


Figure 4 | Pre-fusion HIV-1 gp120-gp41 
structure shares conserved structural and 
topological features with other type I fusion 
machines. a, Pre-fusion (left) and post-fusion 
(right) structures. The pre-fusion structures 

are shown for a single protomer in ribbon 
representation with gp120-equivalent subunits in 
red, and gp41-equivalent subunits in rainbow (blue 
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to orange). The trimeric post-fusion structures 
are shown with one subunit in rainbow (blue to 
orange), and the other in light and dark grey. b, The 
C-terminal portion of the preformed interior helix 
of post-fusion coiled-coil from a is shown, with 
fusion peptides (FP) and N- and C-terminal 
residues of post-fusion coiled-coils labelled, and 
the distance the inner coiled-coil extends between 
pre-fusion and post-fusion conformations 
indicated. c, The gp41 equivalents encircle 
extended {-strands of their gp120-equivalent 
partners. Ribbon representations are shown 
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gp2 encircles N/C termini 


gp120 equivalent (HA1) that is wrapped by the 
gp41 equivalent (HA2), with the N terminus of 
HA2 completing about 20% more than a single 
encirclement. With RSV, it is also only the N 
terminus of the gp120 equivalent (F2) that is 
wrapped by the gp41 equivalent (F1), and the 
termini do not have to be expelled to transition to 
the post-fusion form. With Ebola, the gp41 
equivalent (gp2) wraps both N and C termini 
strands of the gp120 equivalent (gp1), completing 
about 70% of a single encirclement. Such 
encirclement probably helps capture the energy of 
pre-fusion folding, which is released during the 
post-fusion transition to power membrane fusion. 
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around extended termini (or N terminus) of their gp120-equivalents 
(Fig. 4c). Overall, the similarities in pre-fusion folding topology and in 
pre-fusion interior helices observed here, along with the previously ob- 
served similarity in post-fusion coiled-coils (reviewed in ref. 40), provide 
amore general and integrated view of the structural and conformational 
requirements of type I-mediated membrane fusion. 


Glycan shielding and genetic variation 

The pre-fusion mature closed conformation of HIV-1 Env is the target 
of most neutralizing antibodies. The newly revealed structure of a near- 
complete gp120-gp41 Env trimer ectodomain provides an opportunity 
to understand aggregate properties of glycosylation and variation. Glycan 
shielding and genetic variation have long been recognized as mechan- 
isms to avoid antibody recognition’. The BG505 SOSIP.664 sequence 
contains 28 sequons specifying N-linked glycosylation (including a T332N 
mutation). We modelled high mannose glycans (either Man5 or Man9) 
on each sequon and calculated accessible surface for radii ranging from 
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Figure 5 | Fully assembled shield 
revealed by pre-fusion HIV-1 
gp120-gp41 trimer. a, Glycan 
shield. Env N-linked glycans are 
depicted in light green (conserved; 
greater than 90% conservation) or 
dark green (variable; less than 90% 
conservation) on the pre-fusion 
mature closed Env structures for 
BG505 strain of HIV-1 (left), 
influenza virus H3 haemagglutinin 
(HA) (PDB ID 2YP7) (middle), 
and RSV fusion glycoprotein subtype 
A (PDB ID 4JHW) (right). 

A conserved glycan at residue 
2415120 not present in the BG505 
sequence is shown in yellow-green. 
b, Sequence variability. 
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1.4A (the radius of a water molecule) to 10 A (the approximate radius 
ofa single immunoglobulin domain) (Extended Data Fig. 6). In the Man9- 
glycosylated model, 29% of the protein surface was solvent accessible, 
whereas only 3% of the surface was immunoglobulin-domain accessible. 
By contrast, with the fusion glycoproteins from influenza virus and RSV, 
14% and 48%, respectively, of these surfaces were immunoglobulin- 
domain accessible (Fig. 5a). 

In terms of genetic variation, we calculated the per-residue Shannon 
entropy of 3,943 sequences of HIV-1 (Fig. 5b). Approximately 50% of 
the surface was shown to have a variability of greater than 10%, a degree 
of surface variation shared by influenza virus, but not by RSV. When 
we combined glycan shielding and genetic variation, only ~2% of the 
surface was immunoglobulin accessible with a variability ofless than 10% 
(Extended Data Fig. 7, upper panels); much of this conserved surface 
occurred at the membrane-proximal ‘base’ of the spike, which is expected 
to be sterically occluded by the viral membrane. To determine how this 
fully assembled shield compared to other conformations, we also assessed 
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Figure 6 | Location and prevalence on the HIV-1 
Env spike of neutralizing responses identified 
serologically from cohorts, 2-3 and 5+ years 
post-infection. a, The location of the 
neutralization epitopes for broadly neutralizing 
antibodies is depicted on the pre-fusion mature 
closed Env spike with red for CD4-binding-site- 
directed antibody specificities (VRCO1-, b12-, 
CD4-, and HJ16-like), purple for 8ANC195-like, 
green for V1V2-directed (PG9-like), blue for 
glycan-V3 specificities (PGT128- and 2G12-like), 
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the immunoglobulin accessibility of the CD4-bound conformation (Ex- 
tended Data Fig. 7, lower panels). Notably, the CD4-bound conforma- 
tion showed substantially higher levels of glycan-free, conserved sur- 
face, consistent with the greater ease by which antibodies reactive with 
the CD4-bound conformation are elicited—and by contrast, the dif- 
ficulty in eliciting broadly neutralizing antibodies against the glycan- 
covered, sequence-variable pre-fusion closed state. 


Serologic recognition of pre-fusion closed Env 


Despite multiple mechanisms of immune evasion that shield mature 
HIV-1 Env, potent broadly neutralizing antibodies do develop”. The 
structure of HIV-1 Env in the pre-fusion mature closed state allows us 
to map known epitopes on their most likely functional target (Fig. 6a) 
and to compare the recognition of broadly neutralizing HIV-1 antibod- 
ies, with those capable of neutralizing influenza virus and RSV (Fig. 6b). 

To determine the location and prevalence of effective humoral re- 
sponses, we used a serologic analysis based on serum neutralization of 
a panel of diverse HIV-1 isolates”®. Sera from a cohort that had been 
infected for 2-3 years and from another that had been infected for more 
than 5 years were assessed on a panel of 21 diverse HIV-1 isolates, and 
the neutralization phenotypes assigned to 12 prototypic antibody- 
neutralization fingerprints (Fig. 6c, Extended Data Fig. 8a, b). We then 
mapped the responses to the surface of the mature closed HIV-1 Env 
spike (Extended Data Fig. 8c, d). The most prevalent response corre- 
sponded to the glycan-V3 epitope epitomized by antibody PGT128. 
CD4-binding site-directed responses, 8ANC195 responses, V1V2-directed 
responses, and 35022 responses were also prevalent after 5+ years. 
Overall, responses in both cohorts were generally in good agreement 
with each other, indicating little evolution in the location or prevalence 
of effective neutralizing responses between 2-3 and 5+ years. Notably, 
when mapping Env sites of vulnerability, the majority of prevalent sites 
corresponded to Env surfaces covered by N-linked glycosylation and/ 
or of high sequence variability. Indeed, both PGT122 and 35022 co- 
crystallized here recognize N-linked glycan, and they both utilize frame- 
work 3 insertions, in the light chain for PGT 122 and in the heavy chain 
for 35022 (Extended Data Fig. 9). 


Viral evasion and immune recognition 


In addition to merging virus and host-cell membranes, viral fusion ma- 
chines must contend with antibody-mediated neutralization. With RSV, 
peak infection occurs at 5-10 months of life, as maternal antibodies wane; 
with influenza virus, natural infection elicits strain-specific antibodies, 
and evasion occurs seasonally on a global scale. HIV-1, however, con- 
fronts the immune system in each individual directly, often presenting 
high titre of Env antigens over years of chronic infection. These differ- 
ences in evasion are reflected by structural differences in the fusion 
machines. The structure of the HIV-1 Env spike revealed here allows 
the molecular trickery behind single-spike entry”, glycan shielding* and 
conformational masking“ to be visualized at the atomic level (Extended 
Data Fig. 10). Thus, avoidance of antibody avidity* through the ability 
ofa single HIV-1 spike to fuse viral and target cell membranes” is likely 
to be assisted by membrane proximity of the co-receptor and mem- 
brane association of the membrane-proximal external region (MPER; 
Fig. 3); despite these differences, the HIV-1 Env spike appears to share 
mechanism and topology with other type I fusion machines (Fig. 4). In 
terms of glycan shielding’, we have modelled the structure of a fully as- 
sembled glycan shield for BG505, a tier II-transmitted founder virus” 
(Fig. 5). Although glycan masking appears to be complete at the HIV-1 
spike apex, closer to the viral membrane ‘holes’ in the glycan shield are 
observed. And with conformational masking“, evasion is optimal for 
the pre-fusion mature closed state, with receptor-binding unmasking 
conserved glycan-free surfaces (Extended Data Fig. 7). Despite extraor- 
dinary glycosylation and sequence variation, the human immune sys- 
tem seems to be up to the challenge of generating HIV-1-neutralizing 
antibodies (Fig. 6). We note that recognition of glycosylation appears 
to be a trait common only to HIV-1-neutralizing antibodies and that 
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both broadly neutralizing HIV-1 and influenza virus antibodies tolerate 
epitope sequence variation (Fig. 6b). The structure of the HIV-1 Env 
spike described here thus reveals not only commonalities in entry and 
evasion with other type I fusion machines, but also commonalities in rec- 
ognition by the human immune system. It remains to be seen whether 
an effective vaccine against HIV-1 can be developed by using the atomic- 
level detail provided here, which should allow for immunogen-design 
strategies such as conformational stabilization” and nanoparticle de- 
livery**; additionally, antibody-type and ontogeny-specific strategies may 
be required, and template ontogenies are becoming available for some 
of the more commonly elicited HIV-1-neutralizing antibodies (Extended 
Data Fig. 8d), such as those against the CD4-binding site” and V1V2 
site”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

BG505 SOSIP.664 expression and purification. The crystallized HIV-1 Env con- 
struct from strain BG505 was generated following published reports'*'*'®, using 
BG505 GenBank accession numbers ABA61516 and DQ208458"; including the 
‘SOS’ mutations (A501C, T605C), the isoleucine to proline mutation at residue 559 
(I559P), and the glycan site at residue 332 (T332N); mutating the cleavage site to 
6R (REKR to RRRRRR); and truncating the C terminus to residue 664 (all HIV-1 Env 
numbering according to the HX nomenclature). This construct is referred to as 
BG505 SOSIP.664 throughout this entire manuscript. 

The BG505 SOSIP.664 construct was co-transfected with furin in HEK 293 
GnTI ‘~ cells using 600 pug of BG505 SOSIP.664 and 150 jig of furin plasmid DNAs 
as described previously’. Transfection supernatants were harvested after 7 days, 
and passed over either a 2G12 antibody- or VRCO1 antibody-affinity column. After 
washing with phosphate-buffered saline (PBS), bound proteins were eluted with 
3M MgCh, 10 mM Tris pH 8.0. The eluate was concentrated to less than 4 ml with 
Centricon-70 and applied to a Superdex 200 column, equilibrated in 5 mM HEPES, 
pH7.5, 150 mM NaCl, 0.02% azide. The peak corresponding to trimeric HIV-1 Env 
was identified, pooled, concentrated and used immediately or flash-frozen in liquid 
nitrogen and stored at —80 °C. 

Fab expression and purification. PGT122 and 35022 IgGs were expressed as 
previously described*’. Heavy chain plasmids containing an HRV3C cleavage site 
in the hinge region were co-transfected with light chain plasmids in 293F (35022) 
or GnTI’~ (PGT122, which is glycosylated) using TrueFect-Max transfection 
reagent (United Biosystems) according to manufacturer’s protocol. Cultures were 
fed with fresh 293FreeStyle media (Life Technologies) 4h post-transfection and 
with HyClone SFM4HEK293 enriched medium (HyClone) containing valproic acid 
(4mM final concentration) 24 h after transfection. Cultures were then incubated at 
33 °C for 6 days, and supernatants harvested and passed over a protein A affinity 
column. After PBS wash and low pH elution, pH of eluate was neutralized with 
1M Tris pH 8.5. Fabs were obtained using HRV3C digestion and collecting flow- 
through from protein A column to remove Fc fraction. Fabs were further purified 
over Superdex 200 in 5mM HEPES, pH 7.5, 150 mM NaCl, 0.02% azide. 
Ternary complex preparation. PGT122 and 35022 Fabs were added to a solu- 
tion of purified trimeric BG505 SOSIP.664 in fivefold molar excess for 30 min at 
room temperature. The complex was then partially deglycosylated by adding Endo 
H (50 ul) for 1 h at room temperature in the gel filtration buffer. The complex was 
then purified over gel filtration equilibrated in 5 mM HEPES, pH 7.5, 150 mM NaCl, 
0.02% azide. Fractions were pooled, concentrated down to 5-10 OD2s0 nm per ml 
and used immediately for crystal screening or flash frozen in liquid nitrogen and 
kept at —80 °C until further use. 

Crystallization screening. The ternary complex was screened for crystallization 
using 576 conditions from Hampton, Wizard and Precipitant Synergy” screens 
using a Cartesian Honeybee crystallization robot as described previously”' and a 
mosquito robot using 0.1 il of reservoir solution and 0.1 ul of protein solution. 
Crystals suitable for structural determination were identified robotically in 0.2 M 
Li2SO4, 6.65% PEG 1500, 20% isopropanol and 0.1 M sodium acetate pH 5.5. Crys- 
tals were reproduced in hanging droplets containing 0.5 il of reservoir solution and 
0.5 pl of protein solution. Optimal crystallization conditions were obtained in 16% 
isopropanol, 5.32% PEG 1500, 0.2 M Li,SO,, 0.1 M Naacetate pH 5.5. Crystals were 
cryoprotected in a solution of 15% 2R3R-butanediol, 5% isopropanol, 0.2 M LizSOu, 
6.65% PEG 1500, 0.1 M sodium acetate pH 5.5, and flash-frozen after covering with 
paratone N**. Data were collected at a wavelength of 1.00 A at the SER-CAT beam- 
line ID-22 (Advanced Photon Source, Argonne National Laboratory). 

X-ray data collection, structure solution and model building. Diffraction data 
were processed with the HKL2000 suite™*. The data were corrected for anisotropy 
using the anisotropy server http://services.mbi.ucla.edu/anisoscale/ with trunca- 
tions to 3.5 A, 3.5 A, 3.1A along a, b andc axes, respectively. Structure solution was 
obtained with Phaser using gp120 (PDB ID 4J6R”’), PGT122 (PDB ID 4JY5") and 
35022 Fv (PDB ID 4TOY’) as search models. Refinement was carried out with 
Phenix” imposing PGT 122, 35022 and gp120 model-based refinement restraint 
during initial round of refinement. Model building was carried out with Coot**. 
The Ramachandran plot as determined by MOLPROBITY” showed 92.66% of all 
residues in favoured regions and 99.03% of all residues in allowed regions. Data 
collection and refinement statistics are shown in Extended Data Table 1. 
Preparation of fluorescently labelled virus. Fluorescently labeled virus was pre- 
pared as described'*. Briefly, for site-specific incorporation of fluorophores the Q3 
(GQQQLG) and Al (GDSLDMLEWSLM) peptides were inserted into the V1 and 
V4 loops of HIV-1 JR-FL gp160 at positions 136 and 404 (HXB2 numbering), re- 
spectively. Virus for smFRET imaging was generated by cotransfecting HEK293 
cells with a 40:1 ratio of wild-type HIV-1 JR-FL gp160 plasmid pCAGGS to HIV-1 
JR-FL gp160 plasmid containing the Q3 and A1 labelling peptides, in addition to 
pNL4-3 Aenv ART. The virus was harvested 24h post-transfection, concentrated 
by centrifugation, and fluorescently labelled with donor and acceptor fluorophores 


through incubation with 0.5 uM Cy3B-cadaverine, 0.5 uM Cy5(4S)COT-CoA, 0.65 uM 
transglutaminase” (Sigma), and 5 ,1M AcpS” overnight at room temperature. The 
AcpS enzyme and the CoA-conjugated fluorophore were prepared as described”. 
DSPE-PEG) ooo-biotin (Avanti) was then added to the reaction at a final concen- 
tration of 6 LM (0.02 mg ml”), and the labelled virus was purified by ultracen- 
trifugation for 1h at 150,000g over a 6-18% Optiprep (Sigma) gradient. 
smFRET data acquisition and analysis. smFRET data were acquired and analysed 
as described'*. Fluorescently labelled virions were immobilized on streptavidin- 
coated quartz microscope slides and imaged on a prism-based total internal reflec- 
tion fluorescence microscope. The donor fluorophore was excited by a 532-nm laser 
(Laser Quantum). The donor and acceptor fluorescence emissions were collected 
through a X60 water objective (Nikon), split by a 650DCXR dichroic filter (Chroma), 
and focused on parallel EMCCD cameras (Photometrics). Movies were recorded at 
25 frames per s for 40 s. smFRET imaging was performed in buffer containing 50 mM 
Tris pH 7.5, 100 mM NaCl, a cocktail of triplet-state quenchers”, and 2 mM pro- 
tocatechuic acid and 8 nM protocatechuate 3,4-deoxygenase to remove molecular 
oxygen*’. Where indicated, surface-bound viruses were incubated with 0.1 mg ml’ 
PGT122 and/or 0.1 mg ml” ' 35022 antibody. 

All data analysis was performed using custom written Matlab software. Fluores- 
cence trajectories were extracted from the movies, and used to calculate FRET effi- 
ciency according to FRET = IA/(ID+1A), where IA and ID represent fluorescence 
intensities of the acceptor and donor fluorophores, respectively. smFRET trajectories 
were identified for analysis on basis of their displaying sufficient signal-to-noise and 
fluorophore lifetime. FRET trajectories were compiled into histograms, which were 
fit to the sum of three Gaussian distributions in Matlab. smFRET revealed that the 
HIV-1 Env is conformationally dynamic, transitioning between three distinct con- 
formational states. Response to various ligands identified the low-FRET state as the 
closed unliganded conformation of HIV-1 Env and the intermediate- and high-FRET 
states as the activated conformations stabilized by coreceptor and CD4 binding. 
Binding studies using biolayer interferometry. A fortéBio Octet Red384 instru- 
ment was used to measure binding of BG505 SOSIP.664 and BG505 gp120 mol- 
ecules to a panel of antibodies (VRC01, VRC03, b6, b12, F105, PGT 122, PGT128, 
PGT135, 2G12, 8ANC195, 17b, 2.2C, 412d, 48D, 447-52D, PG9, PG16, PGT145, 
VRC26.09, 35022, PGT151) and CD4 Ig. All the assays were performed with agi- 
tation set to 1,000 r.p.m. in PBS buffer supplemented with 1% bovine serum albu- 
min (BSA) in order to minimize nonspecific interactions. The final volume for all 
the solutions was 40-50 1] per well. Assays were performed at 30 °C in solid black 
tilted-bottom 384-well plates (Geiger Bio-One). Human antibodies (40-50 pg ml ~ ‘ 
in PBS buffer was used to load anti-human IgG Fc capture (AHC) probes for 600 s. 
Typical capture levels were between 1 and 1.5 nm, and variability within a row of 
eight tips did not exceed 0.1 nm. Biosensor tips were then equilibrated for 180 s in 
PBS/1% BSA buffer before binding assessment of the BG505 SOSIP.664 and BG505 
gp120 molecules in solution for 300 s; binding was then allowed to dissociate for 
300s. Parallel correction to subtract systematic baseline drift was carried out by 
subtracting the measurements recorded for a sensor without monoclonal antibody 
incubated in PBS/1% BSA. Data analyses were carried out using Octet software, 
version 8.0. 

Difference distance analysis. Difference distance matrices** were produced by 
distance sorting atom positions and plotting with the program DDMP®”. 

Surface plasmon resonance analysis. Affinities and kinetics of binding of anti- 
bodies 35022 and PGT151 to BG505 SOSIP.664 soluble trimer were assessed by 
surface plasmon resonance on a Biacore T-200 (GE Healthcare) at 20 °C with buffer 
HBS-EP+ (10 mM HEPES, pH 7.4, 150 mM NaCl, 3mM EDTA, and 0.05% sur- 
factant P-20). In general, mouse anti-human Fc antibody was first immobilized 
onto two flow cells on a CM5 chip at ~ 10,000 response units (RU) with standard 
amine coupling protocol (GE Healthcare). Either CD4-Ig, 2G12 IgG or 17b IgG 
was then captured on both flow cells by flowing over a 200 nM solution at 5 pl min~! 
flow rate for two minutes. This was followed by a 1-min injection of 1 4M human Fc 
on both flow cells to block unliganded mouse anti-human Fc antibody. The cap- 
tured 2G12, CD4 or 17b were used to immobilize BG505 SOSIP.664 trimer on only 
one flow cell, with no trimer captured on the other flow cell (reference cell). For 
capturing with 2G12 or CD4-Ig, 500 nM of unliganded trimer was used, whereas, 
a complex of 500 nM trimer + 1,500 nM sCD4 was used for capturing with 17b. 
Antibody Fab fragment solutions in HBS-EP+ buffer, at twofold dilutions starting 
from 885 nM, 600 nM and 460 nM for 35022, PGT151 and PGT145, respectively, 
were injected over the captured trimer channel and the reference channel at a flow 
rate of 50 jl min” ' for 2 min and allowed to dissociate for 3-30 min depending on 
the rate of dissociation of each interaction. The cells were regenerated with two 10 il 
injections of 3.0 M MgCl at a flow rate of 100 il min '. Blank sensorgrams were 
obtained by injection of same volume of HBS-EP+ buffer without antibody Fab 
fragments. Sensorgrams of the concentration series were corrected with corresponding 
blank curves and fitted globally with Biacore T200 evaluation software using a 1:1 
Langmuir model of binding. The stoichiometry of binding of antibodies to the trimer 


©2014 Macmillan Publishers Limited. All rights reserved 


were estimated by normalizing the maximal response (Rmax) Values to the amount 
of trimer captured and performing linear regression analysis using the Rmax Values 
for the antibodies with known stoichiometries. 

Modelling of missing loops, side chains and the N-linked glycan shield. Miss- 
ing loops not defined in the HIV-1 Env trimer crystal structure (V2 and V4) were 
modelled using Loopy*’. Missing side chains were modelled with Scap™. 

To model the N-linked glycan shield, we first determined all possible N-linked 
sequons in the BG505 HIV-1 Env trimer sequence (28 sequons). All glycans ob- 
served in the structure were removed before modelling. A conserved glycan 241.5120, 
not present in the BG505 sequence, was added. A single asparagine residue in each 
sequon was targeted for computational N-linked glycan addition using a series of 
oligomannose 9 rotamer libraries at different resolutions. In constructing the rota- 
mer libraries, the asparagine side chain rotamers were also considered. To avoid a 
combinatorial explosion in the search space, select torsion angles in the oligoman- 
nose 9 rotamer libraries were allowed to vary in increments between 30-60 degrees. 
We used an overlap factor (ofac) to screen for clashes between the sugar moieties 
and the trimer structure. The ofac between two non-bonded atoms is defined as the 
distance between two atoms divided by the sum of their van der Waal’s radii. For 
the modelling carried out here, we set the ofac to a value of 0.60. For sterically oc- 
cluded positions, the ofac was set to 0.55. To remove steric bumps between sugar 
moieties, all models were subjected to 100 cycles of conjugate gradient energy min- 
imization using the GLYCAM® force field in Amber12® with a distance-dependent 
dielectric. 

Mapping sequence variability onto trimer structure. For each of HIV-1 Env, 
influenza virus HA, and RSV F, residue sequence variability was computed as the 
Shannon entropy for each residue position, based on representative sets of 3,943 
HIV-1 strains, 4,467 influenza virus strains, and 212 RSV strains, respectively. Resi- 
dues were coloured based on the computed entropy values, ona scale of white (con- 
served) to purple (variable). 

Chronically infected cohort information. In the CHAVI 001 cohort, high-risk 
subjects were screened for HIV-1 infection by ELISA, western blotting, and plas- 
ma RNA to recruit individuals with acute HIV infection, who were then followed 
for ~2 years until plasma neutralization breadth developed”. In addition, a group 
of individuals were enrolled in the CHAVI 001 or CHAVI 008 cohorts who were 
chronically infected with HIV-1 strains clade A, B or C, and were screened for plas- 
ma neutralization breadth. The trial participants were enrolled at sites in Tanzania, 
South Africa, Malawi, the United States, and the United Kingdom®. Both CHAVIO01 
and CHAVI008 protocols were approved by the institutional review boards of each 
of the participating institutions where blood samples were received or processed 
for analysis, and informed consent was obtained from all subjects. 

Serum neutralization fingerprinting analysis. The prevalence of effective neut- 
ralizing responses against HIV-1 Env in cohorts from 2-3 and 5+ years post- 
infection was estimated using a neutralization fingerprinting approach, as described 
previously’. Briefly, serum neutralization over a set of 21 diverse viral strains was 
compared to neutralization of the same viruses by a set of broadly neutralizing 
antibodies grouped into 12 epitope-specific antibody clusters. For each serum, the 
relative prevalence of each of the 12 antibody specificities was estimated by repre- 
senting serum neutralization as a linear combination of the monoclonal specifi- 
cities, with prevalence values of 0.2 deemed as positive. Sera with less than 30% 
breadth on the 21-virus panel as well as sera with high residual values from the com- 
putation (data not shown) were not included in the analysis. For mapping preval- 
ence values onto the BG505 SOSIP.664 structure, residues part of multiple antibody 
epitopes were coloured according to the respective antibody specificity with the 
highest prevalence in the 5+ years cohort. Antibody neutralization was measured 
using single-round-of-infection HIV-1 Env-pseudoviruses and TZM-bl target cells, 
as described previously”. Neutralization curves were fit by nonlinear regression 
using a 5-parameter hill slope equation as previously described”. 

Epitope analysis for HIV-1 Env, influenza virus HA and RSV F antibodies. 
Glycan usage and average residue entropy were calculated for seven representative 
HIV-1 Env (VRCO01,b12, CD4, 8ANC195, PG9, PGT 122, 2G12 and 35022)771'""”, 
four representative influenza virus HA (2D1, C05, F10 and CR8043)”*-”, and three 
representative RSV F (D25, Motavizumab and 101F)*"”*”’ epitopes based on their 
respective crystal structures. The selection of the flu antibodies was done as follows: 
F10 (stem targeting) and COS (head targeting) were selected based on their cross- 
neutralizing ability for group 1 and group 2 of influenza virus A. CR8043 (group 2 
specific) and 2D1 (H1 specific), which target distinct regions from F10 and C05 at 
the stem and head of the HA respectively, were also selected for epitope analysis. 
An antigen residue was defined as an epitope residue ifit had a non-zero buried sur- 
face area in the crystal structure. The fraction of glycan surface area in an epitope 
was calculated as the buried surface area of epitope glycans divided by the buried 
surface area of the full epitope. Unpaired nonparametric Mann-Whitney test®° was 
used to quantify the statistical difference between glycan fraction or average residue 
entropy for HIV-1 versus influenza virus or RSV antibody epitopes. 
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Figures. Structure figures were prepared using PYMOL™. PDB IDsare referenced 
throughout except for 2HMG”, 1QU1*, 4MMU”, 3RRT*’, 3CSY* and 2EBO** 
in Fig. 4; 2YP7*° and 4JHW” in Figs 5 and 6 and PDB ID 3HI1* in Extended Data 
Table 2. 

Interfaces. Interactive surfaces were obtained from PISA (http://www.ebi.ac.uk/ 
pdbe/pisa/). 
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Extended Data Figure 1 | Antibody-mediated crystallization and antibody- 
induced conformation. a, Atomic-level structures for HIV-1 Env regions 
determined in complex with HIV-1-neutralizing antibodies. Neutralizing 
antibodies generally recognize the pre-fusion conformation of HIV-1 Env. 
Structures highlighted here display a cumulative sum total of pre-fusion HIV-1 
Env structural information. Env residues are numbered according to standard 
HX numbering (from PDBs). One structure, for antibody D5 (blue), is in 

the post-fusion gp41 conformation, and is not included in the sum total. 
Regions of other structures (purple) did not define sequence register, and were 
also not included in the sum total. References listed in this figure are cited 
elsewhere in the manuscript, except for Rini et al. (1993)*’, Stanfield et al. 
(1999)***°, Ofek et al. (2004), Cardoso et al. (2005)”', Luftig et al. (2006)”, 
Cardoso et al. (2007)”’. b, Antibody-induced conformation of HIV-1 Env in the 
context of infectious JR-FL virions as assessed by smFRET. HIV-1jr-p, 

gp160 was labelled with fluorescent dyes in variable regions, V1 and V4, at 
positions that did not interfere with Env function (see Methods), and virus was 
surface-immobilized for imaging via total internal reflection fluorescence 


microscopy'*. smFRET trajectories were compiled into histograms for the HIV- 
1jx-rr Env trimer, either unliganded or after pre-incubation for 30 min with 
0.1mg ml’ PGT122, 35022, or both PGT122 and 35022 before imaging. 
Resultant Env conformational landscapes could be deconvoluted into three 
gaussian distributions: a low-FRET population that predominated for the 
pre-fusion mature unliganded state, and intermediate- and high-FRET 
populations, which predominated in the presence of CD4 receptor and CD4- 
induced antibody’*. smFRET trajectories are shown for the unliganded 
HIV-1)p-p1, Env trimer as well as in the presence of PGT122, 35022, and both 
PGT 122 and 35022. The concordance between conformational ensembles 
indicates unliganded and PGT122+35022-bound conformation to be similar 
(Spearman correlation coefficient of 0.988). Interestingly, the presence of 
just one of the antibodies (PGT 122) appeared to reduce the high FRET 
population, an effect not observed in the presence of both antibodies; this 
suggests that the antibody-induced stability of a particular state is not solely 
additive, and that antibodies can both induce a particular conformational 
state as well as alter the transition dynamics from that state. 
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View from trimer Side view View from viral 
apex (top) membrane 
(bottom) 


7-stranded 
sheet 


BG505 SOSIP.664 sequence 


gp120 
31 40 50 60 70 80 90 100 110 120 130 
AENLWVTVYYGVPVWKDAETTLFCASDAKAYETEKHNVWATHACVPTDPNPQE IHLENVTEEFNMWKNNMVEQMHTDI ISLWDQSLKPCVKLTPLCVTLO 
140 150 160 170 180 ,abcdefgh 190 200 210 220 
CINVINNITDD........ MRGELKNCSFINMTTELRDKKOKVY SLFYRLDVVQINENQGNRS| INSNKEYRLINCNTSAITQACPKVSFEPIPIHYCAPAG 
230 240 250 260 270 280 290 300 310 


' 1 7 ' ' 7 7 1a 
FAILKCKDKKFNGTGPCPSVSTVOCTHGIKPVVSTOLLLNGSLAEEEVMIRSENITNNAKNILVOFNTPVQINCTRPNNNTRKSIRI. .GPGQAFYATGD 


330 340 350 360 370 380 390 400 410 420 
IIGDIRQAHCNVSKATWNETLGKVVKQLRKHFGNNT I IRFANSSGGDLEVTTHSFNCGGEFFYCNTSGLEN STWISNT, SVOGSNSTGSNDS ITLPCRIKQ 


430 440 450 460 470 480 490 500 510 
IINMWORIGQAMYAPPIQGVIRCVSNITGLILTRDGGSTNSTTETFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTRCKRRVVGRRRRRR 


gp41 


512 520 530 540 550 560 570 580 590 600 610 
AVGIGAVFLGFLGAAGSTMGAASMT LTVOARNLLSGIVQQOSNLLRAPEAQOHLLKLITVWGI KQLOARVLAVERYLRDQOLLGIWGCSGKLICCTNVPW 


620 630 640 650 660 
NSSWSNRNLSE IWDNMTWLOWDKE I SNYTQI I YGLLEESQNQQEKNEOQDLLALD 
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Extended Data Figure 2 | HIV-1 subunit interactions: principle component 
analysis and interface contacts. a, Minimum-bounding box, generated by 
principle component analysis, encasing 90% of the HIV-1 Env gp120-gp41 
protomer. Each gp120-gp41 blade forms a rectangle of height of ~100 A, 
width of ~65 A, and thickness of ~35 A. Subunits are displayed in ribbon 
representation with gp41 coloured rainbow and gp120 coloured and labelled 
red. As previously visualized'*”"', the membrane-distal portion of the rectangle 
is made up of the gp120-outer and -inner domains, with the central 7-stranded 
B-sandwich of the inner domain occupying the trimer-distal, membrane- 
proximal portion of gp120. We have now resolved the rest of the spike: the 
membrane-proximal portion of the rectangle is made up of gp41, with the 
membrane-distal portion of gp41 closest to the molecular threefold axis 
occupied by helix «7 (which corresponds in register to the C-terminal portion 
of the post-fusion HR1 helix of gp41), and the rest of gp41 folding around N 
and C termini strands of gp120, which extend over 20 A towards the viral 
membrane. Of the four helices, «6 kinks at residue 537,,4; and «9 kinks at 
residue 637,541; backbone H-bonding is less ideal at residues 663,,4; and 
664,,41. b, Different views of trimeric protomer association. The protomer 
association at the membrane-distal trimer apex occurs through the corners of 
the minimum-bounding box, whereas the association at the membrane- 
proximal region occurs with substantial interpenetration of the minimum- 
bounding box; these interaction differences and the protruding nature of the 
gp120 outer domain result in the overall mushroom shape of the trimer. 


c, gp120-gp41 interface. Ribbon representation of gp120 (red) and gp41 
(rainbow from blue N terminus to orange C terminus), with gp120 residues that 
interact with gp41 shown in surface representation and gp41 residues that 
interact with gp120 shown in semi-transparent surface. A complete list of 
subunit interactions is provided in Supplementary Table 1. Membrane- 
proximal interactions are further stabilized by hydrophobic interactions, which 
gp41 makes with the N and C termini of gp120, such as between Trp 35,,129 and 
Pr0609,,4; and between Trp 610,,4; and Pro498,,,120, d, Wheel diagram 
representation of 07 coiled-coil in the pre-fusion mature closed conformation 
of gp41 as generated by DrawCoil 1.0: http://www.grigoryanlab.org/drawcoil/ 
(ref. 94). e, gp41-trimer interfaces as viewed from the viral membrane in 
ribbon and surface representation (90° rotation from Fig. 2c). f, BG505 
SOSIP.664 sequence with residues identified by mutagenesis”*"'” to be 
important for gp120/gp41 association underlined. Residues that were found to 
interact between gp120 and gp41 by examination of the crystal structure are 
indicated in red (intra-protomer interactions) and in brown (inter-protomer 
interactions). Sites of N-linked glycosylation are shown in green; glycan N88 
is shown in red because it is part of the gp120/gp41 interactions; no density 
was observed for potential N-linked glycans at residues 185, 398, 406, 411, 
462 and 625. Residues that were disordered in the crystal structure are grey. 
SOS (A501C/T605C) and IP (1559P) mutations are labelled in bold and italics. 
Dots indicate residues not present in the BG505 sequence. 
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ARTICLE 


d Binding residues of fusion-intermediate entry inhibitors and antibody 
mapped onto pre-fusion HIV-1 envelope 


gp120 (A) § 


gp4t(A) 


5-helix (A) p41 (B) 
b 
e Entry inhibitors and antibody docked onto fusion-intermediate 
model of gp41 
N N N 
Shelx % 
C 664 
c 512 520 530 540 550 560 
gp41 AVGIGAVFLGFLGAAGSTMGAASMTLTVQARNLLSGIVQQOQSNLLRAPE 
Pre XXXXXX a6 XXXXXXXXXXXXX 
BOSE AARRIOGOUNRAEREAE = f Pre-fusion gp41 tryptophan-clasp residues target 
HIVpost GAASMTLTVQAROLLSGIVQOONNLLRAIE a previously defined pocket on post-fusion gp41 
SIVpost AQSRTLLAGIVQQQQOLLDVVK 
570 580 590 600 610 5-Helix W628/W631 Footprint D5 Footprint 
gp41 AQQHLLKLTVWGIKQLOARVLAVERYLRDQOLLGIWGCSGKLICCTNVPW 
Pre XXXXXXXX NHR (A) NHR (A) (A) 
Post HRA CHR (A) CHR (A) CHR (A) 
HIVpost AQOQHLLOLTVWGIKOLOARIL 
SIVpost ROQELLRLTVWGTKNLOTRVTAIEKY LKDOAOLNAWGAAFROVAHTTVPW 
wezamnest _ os 
v v620 ¥ 630 W 640 650 660 Pocket Footprint 
gp41 NSSWSNRNLSEIWDNMTWLOWDKEISNYTOIIYGLLEESQNOOQEKNEQDLLALD 
Pre Coe CESS 
Post =-== ——>E—— _SESEERREE a ee) 
HIVpost MEWDREINNYTSLIHSLIEESONOOEKNEQELLELD...683 
SIVpost ....PNASLTPKWNNETWOQEWERKVDFLEENITALLEEAQIOQOQEKNMYELOKLNS 


Extended Data Figure 3 | Modelling of gp41: pre-fusion @6-to-a7 density, 
HIV-1-SIV post-fusion chimaera, and liganded interactions. a, Modelling 
of gp41 residues 548-568. At low contour, suggestive density is observed that 
might correspond to the connection between «6 and «7 helices. This density 
appeared to be crystal dependent and might be related to inherent flexibility, 
functional rearrangements, asymmetry between protomers, or combinations 
of these factors. To investigate the degree to which a model for this region might 
be defined, we built and refined two different models for this region: electron 
density (blue) shown for 2F, — F. density at 1o contour; gp41 (rainbow colour 
from blue to orange) shown in ribbon representation with side chains; gp120 
(red) shown in ribbon representation. The location of the 1559P mutation is 
indicated. b, The two models from panel a are superimposed and shown in 
perpendicular orientations. c, HIV-1-SIV post-fusion chimaera. Sequences of 
HIV-1 gp41 from pre-fusion structure (BG505 strain, PDB ID 4TVP), post- 
fusion structure (HIVpost, PDB ID 2X7R™) and SIV gp41 post-fusion structure 
(SIVpost, PDB ID 2EZO”) are aligned with secondary structure indicated. 
Residues that were used to make the post-fusion HIV-1-SIV chimaera used 
in Fig. 3 are highlighted in red. d, Binding residues of representative 
fusion-intermediate entry inhibitors or antibodies mapped onto the structure 


of pre-fusion HIV-1 Env spike’”'*. Top, ribbon representation of pre-fusion 
envelope protomer A (gp120 in red and gp41 in blue) at two orientations, with 
the binding residues of the fusion-intermediate inhibitors 5-helix'™ and 
T20''° and of monoclonal antibody D5” shown in orange, green and yellow, 
respectively. Bottom, surface representation of the pre-fusion envelope trimer, 
with inhibitor and antibody binding residues mapped onto the surfaces of 

all protomers (A, B, C). gp120 is coloured grey and gp41 is coloured in shades of 
blue, depending on protomer. Binding residues of fusion-intermediate 
inhibitors 5-helix, T20 and monoclonal antibody D5 are shown in same colour 
shades as in the top panels. e, 5-helix, T20 and D5 Fab (all coloured magenta 
and grey) docked onto a model of fusion-intermediate gp41 (coloured as in d). 
f, A previously defined binding pocket on post-fusion gp41 is recognized by 
pre-fusion gp41 tryptophan-clasp residues Trp 628 and Trp 631. Shown is a 
surface representation of gp41 5-helix protein (left, with N-heptad repeat 
(NHR) helices coloured in shades of green and C-heptad repeat (CHR) 
helices coloured in shades of orange). The footprint of gp41 tryptophan-clasp 
residues Trp 628 and Trp 631 is shown in magenta (middle) and that of 

a representative NHR-specific neutralizing antibody, D5, in yellow”!°""° 
(right). 
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a_ Pre-fusion gp120 states: mature closed versus CD4-bound 


N 
Rmsd : 1.485A (2178 atoms) 
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Layer 1 changes — a0 620-21 changes 


Extended Data Figure 4 | Conformational changes between pre-fusion CD4-bound gp120 conformations are highlighted by grey shadows. Sites of 
mature closed state and CD4-bound state of gp120. a, Overall structureand N-linked glycosylation are shown in green. b, Details of conformational 
sequence comparison. gp120 is shown in ribbon representation in pre-fusion changes between the mature closed (red) and the CD4-bound conformations 
mature closed (red) and CD4-bound (yellow, PDB ID 3JWD”) conformation. _ (yellow) of gp120 (shown in ribbon). Regions highlighted cover layer 1 with 
V1V2 (PDB ID 3U2S*') and V3 (PDB ID 2B4C*) have been modelled onto changes at «0 (we note that density in this region is not well defined), layer 2 
the CD4-bound conformation. Secondary structure is defined for pre-fusion _ with changes at «1 and 20-21 rearrangements. All atoms r.m.s.d. are as 

and CD4-bound conformation on the BG505 sequence, with cylinders follows: residues 545p120-74gp120 F-m.s.d. = 4.759 A; residues 985,120-117 gp1205 
representing o-helix and arrows p- strands. Disordered residues are indicated _r.m.s.d. = 0.497 A; 4245,,120-436gp120, F-m.s.d. = 3.196 A. 

by X. Residues that move more than 3 A between the mature closed and the 
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Extended Data Figure 5 | Antigenic profiles of HIV-1 envelope 
conformational states. a, Qualitative recognition of HIV-1 envelope by 
diverse antibodies is shown for five conformational states. Green bars indicate 
reported recognition and red bars no recognition; absence of a bar indicates that 
recognition is undefined. The compiled data are from cited references and 
experiments described in this figure. Note references 112-127 are cited here. 
b, Octet Biosensorgrams of BG505 SOSIP.664 (left) and BG505 gp120 

(right) binding to human monoclonal IgGs. The dotted line indicates the 
beginning of the dissociation phase and the maximal specific binding after 300 s 
reported in the table (—, <0.05 response units (RU); +, 0.05 RU to 0.25 RU; 
++,0.25 RU to 0.5 RU; and +++, >0.5 RU). BG505gp120 did not contain the 
T332N mutation (no glycan at that position). Both proteins were made in 
GnTi ‘~. We note that antigenicity of the BG505 SOSIP.664 and BG505 gp120 
protein varied depending on the assay done. Thus, using surface plasmon 
resonance (SPR), no CD4i antibody binding was detected while some binding 
could be observed using biolayer interferometry. Although PG9 bound BG505 
gp120 in ELISA", it did not bind in biolayer interferometry format. We 
observed 447-52D binding, while it was not observed in previously published 
ELISA". c, SPR binding affinities of 35022, PGT151 and PGT145 to BG505 
SOSIP.664 and influence of sCD4. d, Estimation of binding stoichiometry for 
35022, PGT151, and PGT145 to trimeric BG505 SOSIP.664 by SPR and 


versus normalized Rmax values 


comparison to published data'*”°”'*. e, Effect of sCD4 and sCD4/17b on 
binding of antibodies 35022 and PGT151 to BG505 SOSIP.664 by SPR. The 
structure of a pre-fusion mature closed state of HIV-1 provides a critical 
addition to the pantheon of HIV-1 Env structures with atomic-level detail. 
Moreover, antibodies 35022 and PGT151, which bind specifically to the 
trimeric pre-fusion conformation of gp41, provide new tools by which to assess 
the conformational state of gp41'*!°”!. The binding of antibodies 35022 and 
PGT151 to BG505 SOSIP.664 trimer was tested in the presence of the CD4 
receptor and the 17b antibody'" (a co-receptor surrogate which recognizes a 
bridging sheet epitope that overlaps the site of co-receptor recognition). In the 
case of antibody 35022, CD4 binding to the BG505 SOSIP.664 trimer affected 
the kinetics, affinity and stoichiometry of binding. 35022 bound to BG505 
SOSIP.664 with an 8.4-fold reduced affinity, primarily contributed by an 
increased rate of dissociation. The overall binding level (Rmax) normalized to 
the average level of trimer captured (see also panel d) was lower, suggesting 
substoichiometric binding. Capturing the trimer on a CD4-Ig surface reduced 
normalized Rynax for PGT151 compared to the 2G12 capture format, suggesting 
reduced stoichiometry for PGT151 binding to trimer pre-bound with CD4, 
although kinetics and affinity of interaction were similar. A BG505 SOSIP.664 
trimer + sCD4 complex captured onto a 17b surface-bound 35022 but showed 
no detectable binding to PGT151. 
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a Mang9 N-linked glycan coverage for trimers 
using probe radii of 1.4 and 10A 


Probe Radius (R=1.4 A) 


Fraction Glycan Covered Fraction Glycan Free 
HIV 0.71 0.29 
HIV2q1 0.72 0.28 
Flu 0.54 0.46 
RSV 0.26 0.74 


Probe Radius (R=10.0 A) 


Fraction Glycan Covered Fraction Glycan Free 
HIV 0.97 0.03 
HIVoq1 0.97 0.03 
Flu 0.86 0.14 
RSV 0.52 0.48 
° Radius1.4 A Radius 10 A 
“ a i 
- es 
Flu 2 ie 
- € € 


Extended Data Figure 6 | N-linked glycan occlusion of type I fusion 
machines. The pre-fusion mature closed conformation of HIV-1 Env evades 
the humoral immune response with a fully assembled glycan shield. Here we 
calculate and display the solvent-accessible surface of glycan and protein for 
HIV-1 Env, HIV 4; (which contains an added glycan at position N241), 
influenza virus haemagglutinin and RSV fusion glycoprotein. Calculations of 
the percentage coverage of the protein surface were determined for trimeric 
type I fusion machines based on two probe sizes of 1.4 A (solvent radius) 

and 10.0 A (the estimated steric footprint of an antibody combining region). 
Surface area calculations were carried out according to Kong et al.”, and images 


b Mand N-linked glycan coverage for trimers 
using probe radii of 1.4 and 10 A 


Probe Radius (R=1.4 A) 


Fraction Glycan Covered Fraction Glycan Free 
HIV 0.63 0.37 
HIV2q1 0.63 0.37 
Flu 0.44 0.56 
RSV 0.19 0.81 


Probe Radius (R=10.0 A) 


Fraction Glycan Covered Fraction Glycan Free 
HIV 0.94 0.06 
HIVoq1 0.95 0.05 
Flu 0.77 0.23 
RSV 0.42 0.58 
d Radius 1.4A Radius 10 A 
. he Se 
~ he SF 
: b 4 
i. & & 


were generated using Grasp v1.3°°. All models were refined using Amber 
with the GLYCAM force field (see Methods for details). The PDB IDs 
associated with the glycosylated models are: 4TVP (HIV-1), 2YP7*° (Flu) 
and 4JHW”! (RSV). The strains associated with the PDB IDs are: 
BG505.SOSIP.664 (HIV-1), H3N2 A/Hong Kong/4443/2005 (Flu) and 
A/A2/61 (RSV). The solvent-accessible protein surface is shown in red, and 
N-linked glycans are shown in green. a, Estimated Man9 glycan coverage. 

b, Estimated Man5 glycan coverage. c, Visualization of Man9 N-linked glycan 
coverage for two probe radii. d, Visualization of Man5 N-linked glycan 
coverage for two probe radii. 
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Extended Data Figure 7 | Glycan shield and sequence variability for HIV-1 
pre-fusion mature closed and CD4-bound conformations. Many 
conformations of HIV-1 Env divert the immune response. Thus for example, 
shed gp120 and post-fusion gp41 represent dominant viral antigens; however, 
these forms of Env are not functional, and antibodies that only target them are 
not neutralizing. Functional conformations, however, may be significantly 
shielded from the neutralizing antibody. The CD4-bound conformation of 
HIV-1 Env, for example, is only functionally present when the viral and target- 
cell membranes are in close proximity, and the exposed co-receptor binding site 
(including V3- and CD4-induced epitopes) is spatially occluded from 


variable 


neutralizing antibody. Here we provide models for the pre-fusion closed state 
versus the CD4-bound conformation, which display the fully assembled glycan 
shield and surface Env variability. Env N-linked glycans are depicted in 

light green (conserved; greater than 90% conservation) or dark green (variable; 
less than 90% conservation) on the mature closed Env structure and modelled 
CD4-bound conformation. Env sequence variability is shown from white 

to purple (conserved to variable). A conserved glycan at residue 241,,19 not 
present in the BG505 sequence is shown in yellow-green. As can be seen, 

the pre-fusion closed state has few glycan-free surfaces, whereas the CD4- 
bound state exposes substantial glycan-free conserved surface. 
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a Serum ID 6101.10 Bal.01_BG1168.01 CAAN.A2 DU156.12 DU422.01 JRCSF.JB JRFL.JB KER2018.11 PVO.04 Q168.a2 Q23.17 Q769.h5 RWO020.2 THRO.18 TRJO.58 TRO.11 YU2.DG ZA012.29 ZM106.9 ZM55.28a 
703010505 65 376 a1 62 304 220 242 1370 123 40 220 180 62 320 <40 <40 61 250 50 56 65 
703011754 124 1097 69 280 1535-895 813 335 572 <40 1841209349 524 42 <40 <40 455 415 193 79 
703010694 158 314 <40 57 150 53 1258 «628 40 282 116 435 34 1073 <40 125 144 264 252 90 <40 
703010848 240 303 <40 899 144 139 273 474 <40 135 <0 «1483. <4D 5883. < AO <40 434 122 136.5 = 440 239 
703011244 126 1M 40 44 106 152 258 79 44 <40 <40 124 84 296 47 <40 116 40 <40 <40 <40 
703010010  <40 283 <40 <40 40 <40 165 <40 <40 53 4 1 44 va 58 <40 67 104 <40 <40 43 
700010592 <40 351 <40 63 54 <40 89 <40 <40 405 <40 270 <40 <40 <40 40 68 <40 <40 <40 46 
704010083 <40 <40 <40 127 99 68 52 7 <40 <40 <40 <40 <40 102 <40 <40 <40 <40 67 110 <A0 
702010047 <40 184 <40 <40 40 108 = 5837S <0 <40 <40 90 <40 <40 51 224 <40 196 <A0 <40 <40 <A0 
707010457 113 152 <40 44 390 467 87 700 146 1304351 83 613 1591 217 740 684 841 187 170 698 
703011852 437 287 <40 439 998 974 553 1245 76 362 87 826 7 7041 <40 91 2652 786 403 301 389 
707010219 © 157 175 187 179 219 142 55981-2203 3421 1835 3640= 7177S 19080 «1172,='s 7584 «= 3862792 976 219 487 «5522 
703011749 208 241 <40 87 998 310 5221-3688 <40 1183 56 755 177232 60 232 602 174 <40 750 153 
704010581 273 1193 69 282 787 337 1415-652 199 1009 136 639 1092129267 985 2695 892 178 306 163 
703010547 98 121 <40 127 1324-508 401 459 57 226 58 996 <40 4754 <0 57 1912 583 54 481 130 
703010401 189 472 188 477 3179—s«8 1653-876 780 77 1869 992 1193 760 937 1876 © 1962S 490 359 706 1628 
707010763 548 857 <40 163347 1913 208 1209 3140 300 19254827 <AO 2155 <0 <40 854 207 299 551 146 
704010210 246 298 78 <40 3107. 400 150 60 <40 2m <40 78 <40 329 78 <40 168 <A0 63 <40 <40 
707010175 83 81 48 71 400 170 322 244 66 325 118 861 76 1335 182 118 272 144 <40 149 73 
703010269 138 639 <40 <40 444 97 2514 417 <40 4044 <40 1160 <0 606 208 <40 414 120 128 72 61 
702010440 142 89 57 117 364 154 609 306 42 802 73 88 <40 613 110 275 562 93 <40 87 59 
705010741 <40 101 <40 <49 4ir 140 71 <40 <40 45 60 94 <40 66 <40 134 <40 <40 <40 50 <40 
704010343218 514 <40 96 297 162 1397 842 <40 875 <40 107 <40 1629 46 594 148 194 <40 252 109 
705010765 287 260 <40 517 769 458 1630 658 <40 427 <40 245 <40 2514 57 112 1043 221 424 804 151 
702010432 «64 279 <40 164 197 76 996 191 1413 61 <40 2108 = <0 322 <40 97 153 58 248 955 55 
704010461 86 214 <40 <40 549 58 59 74 <40 208 <40 67 <40 125 <40 337 456 130 <40 232 <40 
707010562 © 94 458 <40 109 340 97 9672-1132 <40 81 787 = «2467, 4333603 “1 145 647 360 122 383 174 
706010383 292 374 64 1216 198 247 we 123 50 1092154 94 <40 193 153 255 522 4 50 150 92 
704010540 © 135 193 <40 <40 176 50 131 181 <40 83 281 46 113 319 165 345 205 129 83 219 99 
713080258 © <40 534 51 82 51 83 71 310 59 81 157 <40 58 51 <40 200 154 53 51 120 58 
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Extended Data Figure 8 | Prevalence of neutralizing responses identified 
serologically from cohorts from 2-3 years and 5+ years post infection. 

a, Serum neutralization on 21-strain virus panel. IDsos (reciprocal dilution at 
which 50% of the virus is neutralized) are shown for serum (rows) titrated 
against HIV-1 viral strains (columns). b, For each serum, the predicted 
neutralization prevalence for each of 12 antibody specificities is shown based 
on neutralization of 21 diverse HIV-1 strains. Values of at least 0.2 were 
considered positive and counted toward the overall cohort prevalence 


percentages in Fig. 6c. c, Prevalence of antibody specificities onto the HIV-1 
Env, coloured as indicated in the bar graph. d, The antibody specificities for 
high serum prevalence in the 5+ years cohort are depicted by Fabs of 
representative antibodies binding the BG505 SOSIP.664 Env trimer, shown in 
grey ribbon representation, with glycans as green sticks. Note that while 
prevalence between the two cohorts showed good correspondence, there were 
notable differences, for example, between PGT151 at 2-3 years and 5+ years 
in this study as well as between the cohorts analysed here and in ref. 13. 
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Extended Data Figure 9 | Antibodies 35022 and PGT122: interface with 
HIV-1 Env and comparison of bound and unbound Fab conformations. 
Despite the substantial immune evasion protecting the mature unliganded state 
from humoral recognition, after several years of infection, the human immune 
system does generate broadly neutralizing antibodies. 35022 and PGT122 
are two of these antibodies, which neutralize 62% and 65% of HIV-1 isolates 
at a median ICs (half maximal inhibitory concentration) of 0.033 and 
0.05 1g ml’, respectively'*"”. Here we provide additional details on 35022 and 
PGT122 recognition. a, 35022 Fab is shown in ribbon representation (purple 
(heavy chain) and white (light chain)). The gp120 subunit is shown in red, 
the gp41 subunit in rainbow (from blue N terminus to orange C terminus), and 
glycans in green sticks. Complementarity determining regions (CDRs) are 
labelled, and interactive HIV-1 Env residues highlighted in semi-transparent 
surface representation. At the membrane-distal surface of 35022, an extended 
framework 3 region (FW3) of the heavy chain (resulting from an insertion 
of 8 residues) interacts with strand B1 of the 7-stranded inner domain sandwich 
of gp120. The heavy chain-CDRs form extensive contacts with the N-linked 
glycan extending from residue 88,,,:39. In addition to glycan contacts, the CDR 
H3 of 35022 interacts with the «9 helix of gp41. Helix «9 interactions are also 
made by the FW3 of the light chain (a complete list of contacts is provided 
in Supplementary Table 3). Overall, 35022 buries 1,105 A? solvent surface on 
gp120 (including 793 A? with the Asn 88,,129 glycan) and 594 A” solvent 
surface on gp4l (including 127 A’ with the Asn 6 18,p41 glycan). Despite residue 
625,p4 being part of the glycan sequon ‘NMT’, no glycan is observed; indeed, 
the side-chain amide of residue 625,,4; hydrogen bonds with the side-chain 
oxygen of Tyr 32 in the 35022 heavy chain, and the presence of an N-linked 
glycan at residue 625,54) is difficult to reconcile with 35022 recognition. 
b, Same colours as a, with 35022 Fab shown in surface representation. c, Same 
colours as a, with 2F, — F. at 1o contour (blue density) shown around glycan 88 
of gp120. Antibody 35022 employs a novel mechanism of glycan-protein 
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d 35022 bou 


versus Unbound 


Rmsd all atoms: 0.687 A 


h PGT122 bound versus unbound 


Heavy chain Light chain 


Rmsd all atoms: 1.268 A 


recognition, combining a protruding FW3 with CDR H1, H2 and H3 to forma 
‘bowl’ that holds glycan. FW3 and CDR H3 provide the top edges of the 
bowl and interact with the protein surface of gp120, whereas CDR H1 and H2 
are recessed and hold/recognize glycan. This structural mechanism of 
recognition contrasts with the extended CDR H3-draping glycan observed with 
other antibodies that penetrate the glycan shield such as PG9*' and PGT128”. 
d, Unbound and HIV-1 Env-bound 35022 Fabs were superimposed, and 
ribbon representations and r.m.s.d.s are displayed. Unbound 35022 Fab is 
coloured cyan (heavy chain) and green (light chain), and bound 35022 Fab is 
coloured deep purple (heavy chain) and white (light chain). Regions that 
showed conformational changes are highlighted with black dotted lines. We 
note that in the 35022-bound conformation, density is poor and/or sparse for 
the Fc portion of the Fab. e, PGT122 interface details. Ribbon representation 
of PGT122 Fab in blue (heavy chain) and light blue (light chain) interacting 
with one gp120 subunit, shown in red with glycans in green sticks. 

CDRs are labelled, and interactive HIV-1 Env residues highlighted in 

surface representation. Primary contacts between antibody PGT122 and 
N-linked glycan involve N137 and N332, with minor contact with N156. 
Although portions of glycan N301 can be observed in the electron density, 
no direct contacts with PGT122 are observed; a complete list of contacts 
between PGT 122 and BG505 SOSIP.664 is provided in Supplementary Table 4. 
f, Same colours as e, with PGT122 Fab shown in surface representation. 

g, Same colours as e, with 2F, — F, at lo contour (grey density) shown 
around glycan 332 of gp120. h, Comparison of bound and unbound PGT122 
Fab conformations. Unbound and HIV-1 Env-bound Fabs were superimposed, 
and ribbon representations and r.m.s.d.s are displayed. Unbound PGT122 
Fab is coloured cyan, and bound PGT 122 Fab blue (heavy chain) and light blue 
(light chain). Regions which showed conformational changes are highlighted 
with black dotted lines. 
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Extended Data Figure 10 | Structural implementation of HIV-1 molecular _ is shown in ribbon representation with secondary structure elements labelled; 
trickery. The pre-fusion HIV-1 Env trimer (left) is displayed with evasion and the third protomer is shown in light grey surface. The MPER region for 
mechanisms and their structural implementation (right). The gp120 subunit —_ each protomer is shown as a stylized helix associated with the viral membrane. 
is shown in red, the gp41 subunit in rainbow (from blue N terminus toorangeC The location of secondary structural elements, termini, and residues called in 
terminus), and crystallographically defined glycans in green. One protomer the text has been labelled (red font for gp120 and black font for gp41). 

is shown with Co trace and glycans in stick representation; a second protomer 
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Extended Data Table 1 | Data collection and refinement statistics 


BGS505 SOSIP.664 with PGT122 and 35022 
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Extended Data Table 2 | Modelling parameters for gp120 and gp41 rearrangements 


Pre-fusion mature 
closed state 


Pre-fusion partially 
open intermediate 


Pre-fusion receptor-bound 
open intermediate 


Crystal structure Crystal structure of core (3JWD) 


Post-fusion 


Crystal structure of core (3JWD) 


gp120 (4TVP) Crystal structure (4TVP) — with modeled V3 (3HI1) and V1V2 with modeled V3 (3HI1) and 
3U4E V1V2 (3U4E 
V1V2 Native Rotated 6° Rotated to align with bridging sheet pctetaet — area 
V3 Native Rotated 6° Protruding towards target cell Protruding towards target cell 
Core Native Rotated 6° Rotated 50° Rotated 50° 
N+C-term Native Native Unknown Rotated 45° 
a1 Crystal structure Crystal structure (4TVP) Crystal structure (4TVP) with Crystal structure (chimaera of 
g9P (4TVP) with modeled a7 modeled a7 and «6 removed 2X7R and 2EZO) 
a6 Native Disassembling to a7 Disassembled Extended to post-fusion HR1 
al Native Extending Extended with fusion peptide Extended to post-fusion HR1 
a8 Native Native Native Extended to post-fusion HR2 
ad Native Native Native Extended to post-fusion HR2 


To provide reference frames for the various pre-fusion conformational states, we extracted Env component of SOSIP bound by VRCO3*’ and by VRC-PG 
the CD4-bound conformation of trimeric BAL?® (using Chimera http://www.cgl.ucsf.edu/chimera/). Once maps were aligned, gp120 and gp41 mode 
jack text to the right of labels ‘go120’ and ‘gp41’. In addition to rigid-body fits of crystal structures, specific regions of go120 and gp41 were modelled 
ortions of gp120 and gp41 relative to the pre-fusion mature closed 
nliganded conformation. It is ‘mature’, with the C terminus of gp120 cleaved from the N terminus of gp41. We do not know the structure of gp41 int 
leaved state suggest a distinct go41 conformation; in our pre-fusion HIV-1 Env structure, the observed C terminus of gp120 at residue 505,5120 al 
istance which cleavage may help the pre-fusion structure to accommodate. Finally, the interactions between V1V2 and V3 at the trimer apex indic 


b 
pi 
u 
c 


efined here (PDB ID 4TVP) is thus in the pre-fusion mature closed s 
with ordered secondary structure and/or with altered protection upo 


ate. Recently, hydrogen-deuterium exchange and oxidative labelling were use 
n CD4 binding; these data agree with both structures and modelling presented 


computationally removed. «7 of gp4 
was modelled by fitting the CD4-bound gp120 core crystal structure ( 
structure (PDB ID 3U4E**) was fit to the remaining density. «7 of gp4 


was extended into the unoccupied density at the N terminus of the helix using the mature closed structure as a 


was extended through an alignment with crystal structures of post-fusion gp4 


the fit of the BG505 SOS 


047" (also called PGVO4), and aligned the resultant maps with 
s were fit to each of the maps as defined in the table above in 
. These are defined in the table above in red text after different 


state. The 35022-/PGT122-bound BG505 SOSIP.664 structure analysed here was found by smFRET to closely resemble the pre-fusion 


e uncleaved state, but antigenic differences with the mature 
nd N terminus of gp41 at residue 518,,41 are 37 Aapart,a 
‘ate a closed conformation. Altogether, the crystal structure 
by Guttman et al.!"? to identify regions in BG505 SOSIP.664 
here. We fit 4TVP without modification to EMDB-5779"° with 


ensity from VRC-PG04 Fabs computationally removed. The pre-fusion partially open intermediate conformation was modelled by a rigid body fitting of go120 to EMDB-2484”’ with density from VRCO3 Fabs 


starting model. The pre-fusion receptor-bound intermediate 


PDB ID 3JWD*) to the CD4-bound EMDB-5455*8 map. V3 of the crystal structure (PDB ID 3HI1°°) was aligned to the core and the V1V2 crystal 


(PDB IDs 2X7R*4, 2EZO?5). Post-fusion gp120 is in the same 


conformation as the pre-fusion receptor-bound intermediate and the post-fusion gp41 structure was derived from an alignment of SIV and HIV post-fusion crystal structures (PDB IDs 2X7R*4, 2EZO?°). Note that 
P.664 structure defined here to the pre-fusion mature closed state was very good, the fit to the pre-fusion partially open intermediate was good except for the V1V2 region and the 


membrane-proximal region of gp41, and the fit of the CD4-bound core gp120 to the pre-fusion receptor-bound open intermediate state was similar to 
only approximate the actual molecular motions between conformations. 
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Two families of exocomets in the B Pictoris system 


F. Kiefer!?*, A. Lecavelier des Etangs!”, J. Boissier*, A. Vidal-Madjar'”, H. Beust®, A.-M. Lagrange’, G. Hébrard’? & R. Ferlet!? 


The young planetary system surrounding the star B Pictoris harbours 
active minor bodies’ *. These asteroids and comets produce a large 
amount of dust and gas through collisions and evaporation, as hap- 
pened early in the history of our Solar System’. Spectroscopic obser- 
vations of B Pictoris reveal a high rate of transits of small evaporating 
bodies*""’, that is, exocomets. Here we report an analysis of more 
than 1,000 archival spectra gathered between 2003 and 2011, which 
provides a sample of about 6,000 variable absorption signatures aris- 
ing from exocomets transiting the disk of the parent star. Statistical 
analysis of the observed properties of these exocomets allows us to 
identify two populations with different physical properties. One fam- 
ily consists of exocomets producing shallow absorption lines, which 
can be attributed to old exhausted (that is, strongly depleted in vola- 
tiles) comets trapped in a mean motion resonance with a massive 
planet. Another family consists of exocomets producing deep absorp- 
tion lines, which may be related to the recent fragmentation of one or 
a few parent bodies. Our results show that the evaporating bodies 
observed for decades in the B Pictoris system are analogous to the 
comets in our own Solar System. 

From 2003 to 2011, a total of 1,106 spectra of f Pictoris have been ob- 
tained using the HARPS (High Accuracy Radial velocity Planet Searcher) 
spectrograph. Observations of the calcium (Cat) doublet—the Ca 11 
K-line at 3,933.66 A and the Cat H-line at 3,968.47 A—show a large 
number of variable absorption features (Fig. 1) varying on timescales 
of one to six hours. These features, simultaneously detected in both Ca 11 
Kand Ca! H lines, are interpreted as exocomets transiting in front of 
the stellar disk’"*. Since the f Pic Ca I spectrum is typically observed to 
be stable on 30-min timescales, we averaged together spectra in distinct 
10-min time intervals to limit any possible spectral variability. This re- 
sults in a total of 357 spectra with signal-to-noise ratio greater than 80. 
To characterize the profile of these transient absorption lines, we divided 
each of the 357 averaged spectra by a reference spectrum of f Pictoris 
(Extended Data Figs 1 and 2) assumed to be free of the absorption sig- 
natures of transiting exocomets. 
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Figure 1 | A typical Catt spectrum of / Pictoris. a, Cat K-line (3,934 A). 
b, Catt H-line (3,968 A). A typical Ca 1 spectrum of f Pic (black line) collected 
on 27 October 2009 is shown together with the derived f Pic stellar spectrum 
(red line) used as the reference spectrum free of variable absorption features. 


Given the HARPS resolution and sensitivity, each f Pictoris spectrum 
shows an average of about six variable absorption features that are due 
to exocomets. These features have radial velocities ranging from — 150 
to +200kms7' with respect to the f Pictoris heliocentric radial velo- 
city (~20kms~ '). We fitted each feature with a Gaussian profile and 
obtained the estimates of py and py (their depths in the Ca 11 K and H 
lines), v, (the radial velocity of the absorbing cloud) and Av (the line’s full 
width at half maximum (FWHM) expressed in units of radial velocity). 

A large number of the cometary gaseous clouds that pass in front of 
the star and produce the absorption features observed are smaller than 
the stellar disk. Therefore, the px and py of each feature depends on A, 
the Ca* cloud’s opacity (absorption depth), and «=, /Z,, the ratio of 
the area of the cloud 2, over the area of the stellar disk ,. The simu- 
Itaneous fit of the K and H lines yields a non-degenerate determination 
of both «, A, v, and Av for each feature (Extended Data Fig. 3). 

Because the transit of an orbiting exocomet can last several hours, we 
considered the measurements derived from the fit of only one spectrum 
per observation day. This ensured that each set of measurements was 
linked to a different independent object. We thus collected a total of 570 
individual sets of measurements from independent transiting cometary 
clouds. Of these 570 detections, we discarded variable absorption fea- 
tures compatible with px 4 <0.01 and Av<3kms ', in order to avoid 
contamination introduced by fitting spurious features. We thus end up 
with a total of 493 detected cometary clouds. For the statistical analysis, 
we also excluded detections with « = 1, corresponding to clouds cover- 
ing the full stellar disk and for which some physical characteristics, like 
the transit distance, cannot be derived. 

The plot of the absorption depth A as a function of the surface ratio « 
(Fig. 2) shows a depletion of cometary clouds with 0.2 < «<0.5 and 
A = 3 (orlogA = 0.5). This depletion divides the data into two well sep- 
arated clusters, revealing the existence of two distinct populations of 
exocomets. Statistical cluster analysis’ in the (py, px) diagram (Extended 
Data Fig. 4) shows that these two populations can be distinguished by 
the value of px. The first population, called ‘population S’, corresponding 
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Radial velocities are given with respect to the star’s rest frame. CS indicates the 
circumstellar disk contribution, while solid black lines indicate the changes in 
flux caused by the transiting exocomets. Each transiting exocomet produces 

an absorption signature detected at the same radial velocity in both Ca 1! lines. 
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Figure 2 | Coma absorption depths as a function of surface ratio for 
transiting exocomets. The absorption depths A (a dimensionless number) of 
252 exocomets detected with «<1 are shown on a logarithmic scale as a 
function of the surface ratio x, representing the cloud sizes in units of the stellar 
disk’s area. Small symbols correspond to data taken in 2003 and large 
symbols to 2011 data. Error bars represent the standard deviation. The 147 
exocomets producing shallow absorption lines (py < 0.4), the so-called 
population S, are plotted in red, while the 105 exocomets producing deep 
absorption lines (px > 0.4), the so-called population D, are plotted in blue. 
The cloud sizes show a bimodal distribution with a deficiency of exocomets 
with high absorption depths at intermediate sizes. 


mainly to clouds with small surface ratio (« ~ 0.1), produces shallow 
absorption lines (px < 0.4) and the second population, corresponding 
to clouds with large surface ratio (x ~ 0.8), called ‘population D’, pro- 
duces deep absorption lines (px > 0.4). These two populations present 
highly different physical properties (Fig. 3). 

They have different radial velocity and FWHM distributions’*. Exo- 
comets of population S have a broad distribution of radial velocities with 
V;,5 = 36 +55kms_ ', whereas the population D exocomets havea nar- 
row distribution with v;,,p ~ 15 + 6kms~ |. Moreover, population S has 
abroad distribution of FWHM with Avs ~ 55 + 55kms_', while pop- 
ulation D has a peaked distribution with Avp ~ 7 + 3kms_'. Since 
the width of the absorption line is expected to decrease as the distance 
between the exocomet and host star increases", the bimodal FWHM 
distribution indicates that population S exocomets transit at shorter 
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distances than the exocomets of population D. Furthermore, the nar- 
row distribution of radial velocities for population D suggests that these 
exocomets are gathered on neighbouring orbits with similar longitude 
of periastron relative to the line of sight’, in contrast with the exocomets 
of population S which seem to be scattered on a wide variety of orbits. 

But it is not only different orbital characteristics that distinguish the 
two populations. The spatial extent of the cometary gaseous cloud and 
the calcium production rate of a comet both depend on the distance to 
the star’*”’, so the resulting Ca 1 absorption depth also depends on the 
distance. Thus, if the exocomets of population D had the same intrinsic 
properties as exocomets of population S and were only transiting at far- 
ther distances, then one would expect to measure significantly smaller 
Cau absorption depths, as is, however, not observed in Fig. 3c. More- 
over, if the observed exocomets were originating from a single family 
spread over a wide range of orbital distances, we would expect a contin- 
uum of measurements in any of the quantities presented in Fig. 3. On 
the contrary, all histograms show a dichotomy in these measurements, 
which confirms the existence of two families of exocomets orbiting / 
Pictoris. 

The efficiency of converting stellar irradiance incident on a comet into 
the evaporation of gas (mainly water) and dust from its core depends 
on the surface properties and size of its nucleus. In support of the above 
interpretations, estimates of this evaporation efficiency 7 show that pop- 
ulation D exocomets would have a dust production rate about ten times 
higher if they were located at the same distance to the star as population 
S exocomets (Extended Data Fig. 5). 

Coupling the evaporation efficiency with a dynamical model of evap- 
oration’*’* allows the distance to the star at the time of transit d and 
the dust production rate M of each detected exocomet to be derived. 
We find that population D comets orbit twice as far away from f Pic as 
the population S comets, with dp>~19+4R, and ds~10+3R,, as ex- 
pected from their lower FWHM (Extended Data Fig. 6), whereas the 
dust production rates follow Mp / Ms~2. These results show that exo- 
comets of population D present more active surfaces than exocomets 
of population S; this could be explained either by the nuclei being larger 
in size or by the nuclei being disrupted, thus exposing fresh layers of ice 
buried in their core. 

Given that the radial velocities are directly measured from the 
spectra, and assuming that exocomets have near-parabolic orbits, the 
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Figure 4 | Periastron distance versus periastron longitude. The exocomets 
of population D (shown in blue) have large periastron distances, Qn ~18+4R,, 
and a narrow range of longitudes, ~p~7 + 8°. This population of exocomets 
could originate from the break-up of some parent bodies, which would liberate 
large amount of fresh volatiles buried in the cometary core, thus doping the 
gas and dust evaporation rates. The exocomets of population S (shown in red) 
have smaller periastron distances, Q;~9 + 3R,, and a wide range of longitudes, 
Ws~22 + 25°. The Q-versus-q@ relationship predicted for comets trapped in 
4:1 mean motion resonance with a massive planet’® (m’ ~ 10M), a’ ~ 4.5 Au, 
a’ ~ —40°, e’ ~ 0.04) is shown as black solid lines. Error bars represent the 
standard deviation. 


combination of the distance and the radial velocity yields Q, the peri- 
astron distance, and a, the longitude of the periastron. The plot of Q as 
a function of @ shows that orbital properties also differ between the two 
families (Fig. 4). Exocomets of population D have larger periastron dis- 
tances than exocomets of population S, with Qn~18+4R, and Qs~ 
9+ 3R,. They also present a narrow distribution of longitude of peri- 
astron, indicating that all population D exocomets share similar orbits, 
with wp ~ 7 + 8°. This concentration of a large number of bodies on 
similar orbits with a nearly constant longitude of periastron can be ex- 
plained by the disruption of one or a few individual exocomets. These 
observations resemble the Kreutz Family comets in our own Solar System’, 
which are detected with periastron distances ranging from 0.005 au to 
0.01 au and periastron longitude ranging from 10° to 90°. 
Conversely, the population S follows, as expected, a much broader 
distribution of longitude of periastron, with ws ~ 22 + 25°. The arc- 
like structure in Fig. 4 suggests that a fraction of the exocomets of pop- 
ulation S present a strong correlation between the periastron distances 
and the longitudes: starting at logQ ~ 1.0 and w ~ 50°, the periastron 
distances decrease to logQ~ 0.4 and ~~ 100°. This is exactly the 
behaviour predicted'* for comets trapped in a mean motion resonance 
with a massive planet (see figure 11 of ref. 18) such as f Picb (refs 19 
and 20). In this scenario, the lower evaporation rate of population S 
exocomets is explained by the exhaustion of volatiles at the surface of 
their nucleus caused by a large number of periastron passages as they 
evolve towards highly eccentric orbits”!”’. 
Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


This section describes the data analysis, the derivation of the physical properties of 
the exocomets and details of the interpretation. 

Spectra. HARPS is a high-resolution (about 115,000), high-precision (<1 m s~') 
spectrograph installed at the 3.6-m European Southern Observatory (ESO) tele- 
scope located at La Silla, Chile. The spectra were extracted from the detector images 
with the Data Reduction System pipeline of HARPS, which includes localization of 
the spectral orders on the two-dimensional images, optimal order extraction, cosmic- 
ray rejection, wavelength calibration, flat-field corrections, and one-dimensional 
reconnection of the spectral orders after correction for the blaze. A typical HARPS 
spectrum of f Pictoris includes the Ca 1 doublet lines (3,933.66 Aand 3,968.47 A), 
which show contributions from (1) the rotationally broadened stellar lines, (2) the 
circumstellar gaseous disk, and (3) variable absorption features (Fig. 1). 

To compare f} Pictoris spectra collected at different epochs, we first normalized 
all the spectra to the same mean flux level using the mean of the flux in the wings of 
the Cat line, where no variable absorption features are present. We then checked 
for possible shifts in wavelength calibration with time by using the Na 1lines asa ref- 
erence. The circumstellar Na 1 line is steep (Extended Data Fig. 1) and confirms the 
tremendous stability of the instrument during the observation campaign—as ex- 
pected for this spectrograph, which is aimed at detecting minute radial velocity var- 
iations. Table 2 in ref. 23 shows that the accuracy of HARPS is better than 1 ms"! 
over several years. We thus have high confidence in the detected spectral variations. 

Wealso checked for stellar variations in the Ca 11 spectrum over long timescales. 
To do so, we computed a reference spectrum of the Ca 1 stellar lines (see next par- 
agraph) for each observational campaign (2003-2004, 2004-2008, 2008-2009, 2009- 
2011 and 2011). A thorough comparison of these reference spectra allowed us to 
exclude any significant variation of the stellar lines between 2003 and 2011. We 
hence decided to use the whole set of spectra to compute one common reference 
spectrum. 

However, variations of the circumstellar linewidth by about 3 km s | are seen 

between 2003 and 2011. Since the HARPS spectrograph is not capable of resolving 
features below 3 kms! at the Cau doublet’s wavelengths, we discarded the part of 
the spectra corresponding to the circumstellar line region extending from 18 kms" * 
to24kms_! around the circumstellar line centre at 21 kms7!, the systemic radial 
velocity of f Pic. 
Derivation of the reference spectrum. To characterize the absorption features, 
we divided each observed 3 Pic spectrum F,1,,(A) by a reference spectrum FA). The 
reference spectrum was obtained as described below and includes both stellar and 
circumstellar absorption components in the Ca 11 doublet. In the absence of trans- 
iting exocomets, the normalized spectrum Fops/Frer is constant and equal to 1. 

Each Ca 1 spectrum of f Pic shows many variable absorption features of different 
depth and width. As a consequence, none of the 1,106 spectra could be considered 
as an estimate of the reference spectrum. However, at any given wavelength /; around 
the stellar Ca 11 K and H lines, variable features appear and disappear randomly. As 
a result, amongst the whole set of flux measurements F,(A;) (for k= 1,.... N= 
1,106), a small fraction has no or little contamination by variable features, and can 
be used to compute the reference spectrum. We further assumed that, in the ab- 
sence of variable absorption features, the noise SF in the flux measurements is 
Gaussian, and we checked that the root mean square (r.m.s.) of the measurements 
is proportional to VF, with a constant factor independent of the wavelength. 

First, we obtained an estimation of the reference spectrum by considering at each 
wavelength 4; only the 2.5% highest flux values. Each measurement was subse- 
quently sorted in a decreasing order such that F,(A;) = Fy + 1(A,) (fork = 1,...,N= 
1,105). The 2.5% highest flux values (k = 28) are probably not contaminated by 
the variable absorption features and can be considered as an upper limit randomly 
drawn from the Gaussian distribution of the noise centred around the true refer- 
ence flux. In this case, we can estimate the difference between any F,(A,) and the 
reference spectrum F,e¢/;). Above F,(/;) there are k flux measurements that are a 
fraction k/N of the total number of flux measurements at the given wavelength. 
Assuming that the noise 6F is Gaussian with first momentum o = r.m.s., we com- 
pute the cut-off value « at which the probability that 6F > 0% X r.m.s. is k/N. We 
then obtain a first estimate of the reference flux: 


(1) 
Fetk 


(A;) =Fx(Ai) — om X rms. (1) 
However, variable absorption features appear randomly at each wavelength, dis- 
turbing the derivation of the reference spectrum. To improve it, we introduced a 
second step in the computation using a larger number of flux values. For each k 


and wavelength /;, we computed the mean value F;,(/;) of the flux measurements 
F, (Aj) for p such that FO (Ai) <F,(A;) < Fx (Ai). These flux measurements exceed 
the first estimate of the reference spectrum and are therefore not likely to be affected 
by variable absorption lines. This new step takes into account up to 500 flux mea- 
surements in the computation of the reference flux, to be compared to the 28 most 
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extreme flux values used in the first step. Using the value F,(/;), we obtain a new 
estimate of the reference spectrum given by: 


FO ,=F.(Ai)— By xr-m-s. (2) 


ref, 


where /, is the average value of a normalized Gaussian variable in the range [0, x], 
given by: 
fo xe —x?/2 dx 


Pe= "ede 


(3) 


Finally, we compute the final reference spectrum by taking the average of all the 
Bo, with k varying from 3 to 28: 


Fret = (Fo, (4) 
The application of this three-step method at all wavelengths 1; of the Ca 1 spectrum 
allowed us to obtain the reference spectrum plotted in Fig. 2. Its accuracy is such 
that a small interstellar line is detected on the left of the circumstellar line’. We per- 
formed at each wavelength a 7’ test to compare the tail distribution of flux mea- 
surements going from Fy er to F,e¢ + 30 with a Gaussian distribution of photon noise. 
The agreement is good, with 87% of the reference flux values passing the test at a 
significance level of 5%. 

Fitting method. We obtained normalized spectra by dividing each spectrum by the 
reference spectrum. The normalized spectra show exclusively variable absorption 
features, as can be seen by comparing Extended Data Figs 1 and 3. Spectroscopic 
variations were typically not seen on timescales less than 30 min (corresponding 
to the minimum duration ofa transit); however, to limit the effects of any possible 
spectral variability, the spectra were averaged into separate 10-min time intervals. 
We thus obtained 357 re-sampled spectra with signal-to-noise ratio >80. Each of 
them contains an average of six variable absorption features with radial velocities 
between —150kms ‘and 200kms'' with respect to f Pic. These features can be 
fitted by a Gaussian profile: 


ee 


(=) 
pene? a (5) 
where px and py are the line depths in the Ca 11 K and H lines, respectively, vo is the 
radial velocity of the coma and Av is the line’s FWHM. Because the absorption 
features are produced by gaseous clouds passing in front of the star, the depths px 
and py are related to the cloud-to-star surface ratio «=X, /Z,= 1 and the optical 
depth at the centre of the absorption feature in the Ca 11 K line, or absorption depth 
A, by: 
pr=% (1 a e 4) 


pu=a(1—e74/?) 


The absorption depth A depends on the density and depth of the medium. This quan- 
tity is directly related to the px/py ratio characterizing the saturation in absorption 
within the cloud: 


(6) 


Mite (7) 
Pu 
With « <1, the relationship between equations (6) and (7) becomes: 
1 29 — iy (8) 
Pu 


We fitted each variable absorption feature simultaneously in the Ca 1 K line and 
Ca 1H line of all 357 normalized spectra, providing non-degenerate estimates of x, 
A, vo and Av for each variable absorption component. The huge number of spectra 
and the large number of blended components in each spectrum makes the fitting 
of these features challenging. We developed a systematic procedure which we used 
to fit each spectrum automatically by searching for as many lines as required to best- 
fit the data. Since the prior on the number of components is a uniform probability 
function, we used the Bayesian information criterion (BIC) to get the optimal num- 
ber of components necessary to build a fair model of the normalized spectrum. The 
BIC is defined as: BIC = f + kinN, where Nis the total number of data points and 
k is the number of parameters. For each spectrum, we took a number n of compo- 
nents yielding a BIC with a value BIC,,, such that the fit with an additional com- 
ponent yields a BIC,, + ; that does not decrease by more than 6. When BIC,, minus 
BIC,, + ; is less than 6, the model with m + 1 components has a less than 95% pro- 
bability of being closer to reality than model with n components. A typical example 
of a resulting fit using this procedure is plotted in Extended Data Fig. 3. 

Separation of the populations. Figure 2 reveals the presence of two clusters of data 
separated at logA ~ 0.5; one at small surface ratio (« < 0.2) and the other one at 
a larger surface ratio (« > 0.5). Extended Data Fig. 4 also suggests the presence of 
two clusters of features in the distribution of the depth of the K and H lines. We 
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performed a statistical cluster analysis using the k-mean cluster algorithm’? in the 
(Pu Px) diagram. This algorithm identified two clusters of data with px < 0.4 on 
one side and px > 0.4 on the other side. These two clusters are directly related to the 
two clusters in Fig. 2. We performed a Kolmogorov—Smirnov test to compare the 
distributions in parameters «, A, v, and Av of these two clusters. For each of these 
parameters, the two distributions are different with a probability P > 0.9999. A sim- 
ilar statistical cluster analysis performed in (A, «) space led to an analogous sepa- 
ration of the two populations. 

Evaporation efficiency. The quantities measured in the ionic Ca* cloud transiting 
in front of f Pictoris can be used to derive the physical properties of the exocomets 
such as their distance to the star and the dust production rate in their coma. To do 
so, we estimate the evaporation efficiency of each individual detected exocomet, a 
quantity describing the efficiency with which a comet captures and reprocesses input 
stellar energy flux into dust and gas evaporation from its nucleus. 

Definition. After being damped by the opacity of the dust coma, the stellar radi- 
ation incident on the comet reaches the icy nucleus surface covered by a thin layer 
of agglomerated dust’’, and heats it up. A significant part of this heat energy is used 
to sublimate water from the ice of the nucleus, at a rate Zy,0; the remaining energy 
is absorbed by the crust or re-emitted by the surface. 

The heat re-emitted or absorbed by the dusty surface is in part used to therma- 
lize the sublimated gas. Assuming that the distance d of a comet to f Pic is around 
10R, (where R, is the stellar radius) and given that T. ~ 8,000 K, the temperature at 
the surface of the comet is T. = T.¢\/R,/d. As the gas heats up, it escapes from the 
nucleus surface and flows out radially. A water molecule heated to 2,500 K is typ- 
ically accelerated to a radial velocity of 1kms*' provided that v,~ \/3kgT/my,0, 
where kg is the Boltzmann constant, T is the temperature and my,o is the mass of 
one water molecule. While escaping, the gas picks up dust grains from the dust 
mantle and drags them outward, with a mass rate M. Kinetic energy is thus trans- 
ferred from the gas to the dust grains, which are then accelerated to the expansion 
velocity of water molecules near the nucleus'*”*, v, ~ 1 km st, 

The efficiency of this evaporation process, which depends on the surface prop- 
erties of the nucleus (such as size, albedo and fragmentation), can be measured by 
comparing the total energy used per unit time to evaporate dust at a mass rate M 
with the input energy per unit of time reaching the comet, that is, the stellar flux. 
This leads to the following definition of the evaporation efficiency; 


1. 
Zy,0Lu,0 + 5 My, 
9 


n=log 


We introduced Ly,0~3 x 10? kJ kg! as the latent heat of vaporization for water, 
and F,(d) is the stellar flux at a distance d, which is related to the stellar luminosity 
L, by F, =L,/4nd’, with L, =8.6Lo. We neglected the gas kinetic energy, which 
is an order of magnitude lower than the latent heat of water sublimation. 
Reducing numerical factors, and assuming the dust-to-gas mass ratio is constant 
and close to 1 such that Zy,0~M, the evaporation efficiency can be expressed as: 


(10) 
The distance d is expressed in stellar radius units and M in kgs '. Typical values? 
are expected to be 10” <M <5 x 10° kg s~! and 10 <d <50R,, yielding7 < 7 < 10. 
Measurements of the evaporation efficiency. The evaporation efficiency can be esti- 
mated from the measured values of « and A using the conservation of momentum 
in the cometary cloud: the total momentum carried by the stellar photons which 
are absorbed by the Ca* cloud equates the total momentum gained by these Ca* 
atoms. 

On one side, we thus consider the amount of photons absorbed by the Ca* cloud, 
accounting for the contribution of both K and H lines, integrated over its projected 
surface, to be Y, = Ra and F,.q is the stellar flux at distance d from the star. This 
leads to an absorbed momentum flux: 

dP 


dt 


in ~log(Md’) —1.9 


(v=) d 
=3. | (Inet) no F,a(2) = 
abs 


(11) 


(v-v ? di 
os | eter Ra (12) 
c 
with v — v9 = c(A — Ao)/Ag and Ay = 3,950 A. 

On the other side, we consider the total momentum P gained per unit of time by 
the Ca” ions when they are accelerated from their initial velocity v; to a final veloc- 
ity vy 

dP 


ti (13) 


= Mess (4M), 


acc 


We assume vy to follow a distribution of typical width Av and mean vo. The vari- 
ation of momentum is averaged with respect to this distribution, with (v;) = vp. 

These two independent expressions for the momentum flux can be computed 
for each line as a function of A, and Av following: 


(19.3 pc) Fe (Ao) AoAv n -A_ ,-A/2 
ate PR a yom © 8) ea 


=Mca+ (Vo —V;)~Mca+ X Av 


acc 


dP 
dt 


dP 


dt 


where Fg (Ay) =0.25 x 10-!° ergcm™?s~! A7! is the f Pic stellar flux measured 

from Earth around wavelength Ay ~ 3,950 A, accounting for a 0.25 reduction factor 

at the bottom of the K and H lines; the distance d is expressed in stellar radius units. 
Momentum conservation implies that dP/d¢|,.. = dP/dt|,4s leading to: 


Mex 4° ~1.3 x 10° (2—e 4 —e 4/7) 


(15) 


Taking into account the expected abundance of calcium in silicate? (Meat © 
0.01Maust), we conclude that: 

n~9.2+log[a(2—e-4—e 4/)] (16) 
Discussion. The distribution of the measured evaporation efficiencies for the ob- 
served exocomets is in good agreement with typically expected values, 7 <4 < 10 
(Extended Data Fig. 5). The two populations of exocomets have distinct evapora- 
tion efficiency distributions with 175 ~ 8.6 + 0.4 and np ~ 9.4 + 0.1. Exocomets of 
population D are thus almost ten times more efficient at capturing and converting 
input stellar energy into gas and dust evaporation than exocomets of population S. 
In other words, they would have a dust production rate ten times higher, if they 
were located at the same distance to the star as population S. 

The uncertainties quoted above on the measure of the evaporation efficiency in 

each population do not include the effects of the model’s approximations, in par- 
ticular in the estimation of the cometary cloud area and the velocity of the Ca* 
ions. We estimated that model uncertainties by a factor of about two or less on these 
two quantities lead to an error bar of about +0.3 for the evaporation efficiency, 
which is about three times smaller than the difference seen between the two pop- 
ulations (Ay ~ 0.8). This is therefore additional evidence of the existence of two 
families of exocomets with distinct intrinsic properties. 
Distance and dust production rate. Using the model of ref. 15, we can derive the 
distance between the star and the exocomets at the time of their transit. In this 
model, ions are supersonically dragged by the evaporating gas flowing out from the 
nucleus. Ata large distance from the nucleus, the radially expanding ions are slowed 
down by the radiation pressure to a subsonic velocity and are further accelerated in 
the anti-stellar direction. 

As a result, a shock surface is formed at distance rp from the nucleus: 


se yFMd@ ai 
eS 8nfBGM,.a, 


We introduced y = 0.01, the typical calcium abundance in silicate grains; f = 77, 
the ratio of radiation pressure to gravity for Ca* ions in the f Pic environment; 
M,, =1.7Mg, the mass of f Pic; and a, = y/o, a constant factor accounting for the 
shock surface’ and dependent on 1, the Ca* mass, and g, the effective cross- 
section of the stellar flux absorption by the Ca* ions. The F factor’? depends on the 


(17) 


gas production rate Z (typically equal to dust production rate M in Solar System 


comets): 
M Amy ope? 
2Vell MH,O Attéo 


where my,0 is the mass of the water molecule, my is the hydrogen mass; %; = 
6.67 X 10°! m’ is the polarizability of the hydrogen atom; and v, ~ 10kms"! is 
the dust expansion velocity just below the shock surface. The effective cross-section 
of stellar flux absorption by the Ca” ions is 


AH 1 me? 
o= y | yo IK 
rar Av 4mé mec 


(18) 


(19) 


introducing fx = 0.69 and fi = 0.34, the oscillator strengths of the Ca K and H 
lines; and Av ~ 1 kms", the estimated transition linewidth, taking into account 
the natural, thermal and collisional broadenings. Provided rp < R,, the surface ratio 
a= ,/Z, is roughly given by «~ 1} /R2, yielding: 


a=4.5 x 10-4 Mt/3 44/3 (20) 
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Using equation (15), we obtain the distance of each exocomet to f Pic at the time of 
the transit given in stellar radius: 


d=6.2 x 10~°10%q—3/4 


(21) 
Our measurements lead to estimates of the distance between 1R, and 30R, (Extended 
Data Fig. 6), as expected. The dust production rate M is then deduced from equa- 
tions (21) and (10). 

Periastron distance and longitude. Assuming that each exocomet exhibits a near- 
parabolic orbit, an estimate of the distance to the star, together with the measurement 
of the radial velocity at the time of the transit, allows an estimation of the periastron 
orientation and distance. We define @ to be the longitude of the periastron, which 
is the true anomaly of the line of sight with respect to the exocomet’s periastron, 
and Q to be the periastron distance in units of stellar radius, R,. These two 
quantities can be expressed with respect to the distance and the radial velocity 
by solving: 
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GM, sinw 


Rid 1+ cosw 
2Q 


~ 1+ cosw 


Vy = 


(22) 


The first equation is solved using a numerical inversion method, and the second is 
solved once w is known. We plot the (z, logQ) diagram in Fig. 4. 


23. Pepe, F. et al. The HARPS search for Earth-like planets in the habitable zone. |. 
Very low-mass planets around HD 20794, HD 85512, and HD 192310. Astron. 
Astrophys. 534, A58 (2011). 

24. Lallement, R., Ferlet, R., Lagrange, A. M., Lemoine, M. & Vidal-Madjar, A. Local 
cloud structure from HST-GHRS. Astron. Astrophys. 304, 461-474 (1995). 

25. Beust, H., Lagrange-Henri, A. M., Vidal-Madjar, A. & Ferlet, R. The Beta Pictoris 
circumstellar disk. IX—Theoretical results on the infall velocities of CA II, AL Ill, 
and MG Il. Astron. Astrophys. 223, 304-312 (1989). 
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Extended Data Figure 1 | The NaI spectrum of /Pictoris. a, NaI D2-line the circumstellar (CS) disk. The sharpness of the NaI D1 and D2 lines and the 
spectrum (5,890 A). b, Nai D1-line spectrum (5,896 A). It shows the steadiness of this circumstellar feature in all spectra confirm the stability of 
superposition of all Nat spectra of f Pic (black dots) compared with the stellar HARPS ona timescale of years. The narrow absorption lines seen in most of the 
reference spectrum (red line). Radial velocities are given in the star’s rest frame. spectra and not in the calculated reference spectrum are due to atmospheric 
The stable Na! line centred at the star’s radial velocity is identified as due to _—-water. 
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Extended Data Figure 2 | The Catt reference spectrum of #Pictoris. a, Cal 
K-line spectrum (3,933.66 A). b, Cam H-line spectrum (3,968.47 A). It shows 
the superposition of all the Cal spectra of f} Pic (black dots) compared with 
the stellar reference spectrum (red line). The stable circumstellar (CS) line is 
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centred at the star’s radial velocity. Variable absorption features are revealed by 
their diffuse shapes with respect to the dark upper envelop of the cloud of 
points. The predominance of redshifted absorption features is clearly visible. 
A small interstellar (IS) line is noticeable on the left of the circumstellar line”*. 
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Extended Data Figure 3 | A typical fitted Ca 1 normalized spectrum. a, Call 
K line normalized spectrum. b, Cal H line normalized spectrum. The Cat 
normalized spectrum (black line) is obtained through the division of the 
corresponding regular spectrum collected on the 27 October 2009 (Fig. 1) by 
the reference spectrum plotted in Extended Data Fig. 2. Radial velocities are 
given with respect to the stellar rest frame. The fit of each feature detected is 
detailed with red dashed lines, and their superposition with a solid red line. The 
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bottom panels show the residuals of the fit. The grey shading indicates the 
+3kms ! excluded CS region, where variable absorption features caused 
by exocomets are not resolved from the circumstellar line. This spectrum 
presents all types of variable absorption features: a broad and shallow 
absorption at large radial velocity (+50kms_') anda sharp and deep 
absorption at small radial velocity (~20km s'). 
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Extended Data Figure 4 | Diagram of the Camtline depths. PlotoftheCamK px <0.4 (in red) and the 105 exocomets of population D generates the deep 


line depth, px, as a function of the Catt H line depth, py, for the 252 absorption lines with px > 0.4 (in blue). The dotted line represents the full 
independent absorption features with « < 1 caused by individual transiting saturation limit px = py and the dashed line represents the « = 1 limit, 
comets observed between 2003 and 2011. Using k-mean cluster analysis of corresponding to cometary cloud with a projected area greater than the 
these line depth measurements, two populations of exocomets show up: stellar disk area. 


the 147 exocomets of population S generates the shallow absorption lines with 
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Extended Data Figure 5 | Histogram of the evaporation efficiency of The solid black histogram represents the distribution of evaporation efficiency 
transiting exocomets. The histogram of 7, the evaporation efficiencies (in for the 252 observed exocomets with « < 1. The two Gaussian curves are 
black), shows a clear bimodal distribution: population S (in red) is centred obtained by fitting this histogram with the sum of two Gaussians. 


on Ns = 8.6 + 0.4, while population D (in blue) is centred on 7p = 9.4 + 0.1. 
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Extended Data Figure 6 | Histogram of the distances between /# Pic and population S (in red), with dp>~19+4R, and ds~10+3R,. Distances are 
the exocomets at the time of transit. The 105 comets of population D expressed in units of stellar radius (R,). The solid black line represents the 
(in blue) are located further away from the star than the 147 comets of distribution of distances for the whole sample of observed exocomets. 
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Characterizing and predicting the magnetic 
environment leading to solar eruptions 


Tahar Amari’, Aurélien Canou! & Jean-Jacques Aly” 


The physical mechanism responsible for coronal mass ejections has 
been uncertain for many years, in large part because of the diffi- 
culty of knowing the three-dimensional magnetic field in the low 
corona’. Two possible models have emerged. In the first, a twisted 
flux rope moves out of equilibrium or becomes unstable, and the sub- 
sequent reconnection then powers the ejection” °. In the second, a 
new flux rope forms as a result of the reconnection of the magnetic 
lines of an arcade (a group of arches of field lines) during the erup- 
tion itself. Observational support for both mechanisms has been 
claimed’ ’. Here we report modelling which demonstrates that twisted 
flux ropes lead to the ejection, in support of the first model. After 
seeing a coronal mass ejection, we use the observed photospheric mag- 
netic field in that region from four days earlier as a boundary condi- 
tion to determine the magnetic field configuration. The field evolves 
slowly before the eruption, such that it can be treated effectively as a 
static solution. We find that on the fourth day a flux rope forms and 
grows (increasing its free energy). This solution then becomes the ini- 
tial condition as we let the model evolve dynamically under conditions 
driven by photospheric changes (such as flux cancellation). When the 
magnetic energy stored in the configuration is too high, no equilib- 
rium is possible and the flux rope is ‘squeezed’ upwards. The sub- 
sequent reconnection drives a mass ejection. 

Coronal mass ejections (CMEs) are large-scale eruptive events in the 
solar atmosphere that could have impact’? on satellites and ground-based 
power generation. Theoretical models'’”” of their origins use a specific 
coronal configuration whose evolution is computed from a given ini- 
tial state. A global disruption leading to the ejection of a twisted flux rope 
identified with a CME—through an overlying arcade—is then found to 
occur. The rope may be present in the initial state as a stable equilib- 
rium structure; as an unstable or nearly unstable equilibrium structure 
that evolves freely at later times'*”*; or as a subphotospheric structure 
that is forced by buoyancy to emerge through the solar surface into the 
corona, where it expands’*"*. Alternatively, the initial state may simply 
be one or more arcades evolving through photospheric shearing motions, 
the rope being produced during the disruption itself”. 

The following method can be used to determine which, if either, of 
these situations describes an actual eruptive event. In a first step, con- 
sider the pre-eruptive phase during which the coronal magnetic field can 
be assumed to evolve slowly through a sequence of equilibrium force- 
free configurations. Under this quite reasonable assumption, the field 
evolution can be computed from successive measurements made at the 
photospheric level, and the presence or absence of a twisted flux rope 
can be assessed. The problem is, however, very difficult, and safe con- 
clusions can be reached only using a very accurate numerical code suited 
for force-free magnetic fields as well as high-resolution, low-noise mea- 
surements. In a second step, the configuration obtained slightly before 
the eruption is used as the initial state in a dynamical magnetohydro- 
dynamics (MHD) code along with boundary conditions mimicking the 
physical photospheric processes that makes the coronal field evolve. It 
is then possible to look for a disruption of the configuration having pro- 
perties comparable to the observed ones. It is worth noticing that this 


method, once successfully tested against well-documented past events, 
could be applied to forecast the eruption of an active region (the aim of 
studying space weather), the eruptive power at any given time being pre- 
dicted by reconstructing the field at that time and using it in a dynam- 
ical code as indicated above. 

Here we apply this method to US National Oceanic and Atmospheric 
Administration (NOAA) active region 10930 (AR10930), which crossed 
the solar disk during the first half of December 2006. This choice is 
motivated by two factors. First, the region clearly exhibits most of the 
features usually associated with eruptive behaviour. Second, the spec- 
tropolarimeter of the Solar Optical Telescope (SOT) on board the sat- 
ellite Hinode’’ has provided a series of high-precision measurements 
of the photospheric field around the time of the eruption. This prompted 
many groups to study this region, but hitherto none of them has pro- 
vided the complete, fully data-driven picture (including both the pre- 
eruptive and the eruptive phases) that we are seeking here. Most of 
them’**! (see Methods for supplementary references) have concentrated 
on the reconstruction problem, without showing unambiguously the 
presence of the pre-eruptive rope. As for the dynamical evolution of the 
region, an MHD computation leading to an eruption has been proposed”, 
but it starts from an initial state in which a twisted flux tube with ade- 
quate properties has been forced to emerge into the region rather than 
being obtained from a reconstruction based on the only observational 
magnetic data (a similar calculation, but with the tube introduced by 
the insertion method, was recently done for another region“). And the 
mechanism of the eruption has been discussed by comparing some ob- 
servational features with the results of a simple theoretical model in 
which a twisted rope is forced to erupt into a highly sheared arcade”. 

To set the stage, we note that the evolution of NOAA AR10930 was 
associated with the emergence of a positive-polarity sunspot”. This com- 
plex process involved translational and apparent rotational motions of 
the sunspot of the southern region, and led to the elongation of the spots, 
which is the signature of an emerging twisted rope” (Supplementary 
Video 1). A series of flares were observed during the transit of AR10930. 
An X3.4 two-ribbon flare occurred on December 13 from 2:14 UT to 
2:57 UT, with its peak in soft X-rays at 2:40 ur. It had brightenings that 
started around the magnetic polarity inversion line between the two main 
spots, which then moved apart. It was followed bya halo-like CME with 
the potential to create a geomagnetic storm, which was observed using 
the C2 coronagraph on the Large Angle and Spectrometric Coronagraph 
Experiment on board NASA’s Solar and Heliospheric Observatory 
(SOHO) at 2:54 ur. The total energy (kinetic plus gravitational) released 
during this major event was estimated”* to be (1.4-4.5) X 10° erg. 

We compute the magnetic environment above AR10930 from De- 
cember 9 to December 12 at 20:30 ur by using our extrapolation code 
XTRAPOL”, with the required boundary conditions being provided 
by vector magnetograms taken from the scans of the Hinode/SOT 
spectropolarimeter’’. Several important features appear in the recon- 
structed states. As the evolution proceeds from day D—4 (four days be- 
fore the eruption) to day D (the date of the eruption) as a consequence of 
flux emergence, the field suffers a change from an arcade-like topology 
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Figure 1 | Magnetic field evolution during the days before major eruption 
day D. Selected field lines of some magnetic configurations obtained by 
reconstructing the coronal magnetic field around the eruption site using vector 
magnetograms from Hinode/SOT and a static MHD model. Energy builds up 
slowly, with field lines shearing several days before the eruption, during the 
emergence of the positive-polarity sunspot (white) into the background of an 


to a rope-like one (Fig. 1 and Supplementary Video 2). On D—2, the 
model produces two J-shaped structures surrounding a highly sheared 
inner arcade with almost no twist (Fig. 1c), similar to those produced 
in idealized models driven by flux cancellation” *. On D—1, we also use 
our new global reconstruction code MESHMHD” along with compos- 
ite photospheric data from three instruments, which allows us to resolve 
both small- and large-scale features (Extended Data Fig. 1). The active 
region from which the eruption originates is shown to be well isolated 
and not interacting with neighbouring active regions. A flux rope with 
a twist greater than 27 has formed. There is evidence that the tube has a 
hyperbolic structure, which is known from previous models” to be the 
signature of a slow flux and magnetic helicity transfer by magnetic re- 
connection from the photosphere to the twisted rope (Extended Data 
Fig. 2). Some magnetic helicity appears to have been redistributed into 
the rope, resulting in an increase of its twist during the period from 
D—2 to D~1. The structure of the electric currents flowing along it on 
D-—1 is shown in Extended Data Fig. 3. 

The accuracy of our computed configurations is attested to by the 
very good agreement existing between some of their characteristic fea- 
tures and coronal observations. In particular, the magnetic lines of the 
twisted rope (which coincide with the lines of the current density in our 
force-free reconstructions) exhibit dips at the precise location of the fil- 
ament of cool material observed in Ha emission using the spectrohe- 
liograph at the Paris- Meudon Observatory (Fig. 2c, d), and they are well 
aligned with both the X-ray data from the Hinode X-ray Telescope (XRT) 
(Fig. 2b) and the extreme-ultraviolet data from the SOHO Extreme 
Ultraviolet Imaging Telescope (EIT) (Extended Data Fig. 4). The latter 
also show a good alignment with the magnetic lines of some outer parts 
of the region. Furthermore, the series of reconstructions made from 
D-—4 to D—1 show that the rope is forming by emergence from below 
the photosphere as the southern spot of the observed active region is 
emerging, with a very good correlation with the observed photospheric 
tongues (Fig. 2a and Supplementary Video 2). 

During the four modelled days, the magnetic energy increases consid- 
erably. Up to four days before the eruption, the amount of accumulated 
free energy—that is, the part of the energy above that of the current-free 
magnetic field—is too low to power it. At D—2, the free energy enters the 
‘possible’ zone of eruption but is still close to the lower-bound estimate. 
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older, negative-polarity sunspot (black). a, Four days before the eruption 


(D—4); b, D—3; c, D—2; d, D—1. The magnetic field becomes progressively 
sheared in between the two spots, until a large twisted flux rope has formed 
on D~1. Colour code: yellow, sheared arcades; blue and green, J-shaped loops; 
red, largely twisted field lines. 


On D~1, the free magnetic energy becomes large enough to power the 
observed major eruption (Fig. 3a). A few hours before the eruption, the 
energy is close to that ofa particular partially open field, B,,, whose im- 
portance has been stressed in previous idealized simulations* (Methods). 
Magnetostatic equilibrium is no longer possible when the energy of B,, 
is exceeded, a phenomenon known as catastrophic loss of equilibrium. 

Wetake the low-corona configurations computed on D—2 and D—1, 
respectively, as initial conditions of our numerical dynamical MHD 
code METEOSOL? *”*. The static solution evolves under the effects of 
a flux decrease’ (associated with flux cancellation), gas motion charac- 
teristic of a sunspot ‘moat flow’, or photospheric turbulent diffusion*”. 

We find that the magnetic configuration corresponding to the De- 
cember 11, 10:00 ur (D—2) data does not lead to a major disruption, but 
rather to relaxation to a new equilibrium. Pushing the evolution further, 
however, results in the formation ofa small twisted rope that eventually 
suffers a small, confined eruption. We do not get a major disruption 
because not enough flux has been transferred to the rope at D—2, with 
the available free energy staying well below the loss of equilibrium thresh- 
old. This shows that even at an early stage during the build-up of the 
active region the configuration has already acquired some eruptive po- 
tential (even if small). 

The situation is different when the evolution is computed with the 
data of D— 1 at 20:30 ur. The twisted rope has already formed and con- 
tains a large amount of flux and electric current, and it is found to erupt, 
with the magnetic lines exhibiting a shape very similar to that generally 
reported when an eruption is observed on the solar limb (Fig. 4c, d). The 
overall configuration suffers a major disruption (Fig. 4a, b), which we 
argue explains the occurrence of the eruption of December 13, 2:40 UT. 
At the time of disruption, the magnetic energy of the configuration be- 
comes of the same order as the critical energy (that of B,.). We inter- 
pret this as meaning that the field has suffered a loss of equilibrium, where 
the magnetic tension force no longer balances the magnetic pressure 
force. 

The onset ofan eruption has been proposed®'*"* to occur when a rope 
reaches an altitude at which an index n characterizing the vertical decay 
of the horizontal component of the current-free field exceeds a critical 
value n, ~ 3/2 (the ‘torus instability criterion’). We have computed n 
for our configuration of day D—1 and found that 7 is close to n, (Fig. 3c-e). 
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Figure 2 | Twisted flux rope before the major 
eruption. Selected field lines of the reconstructed 
magnetic configuration of December 12, 20:30 UT 
(D-1), with the same colour code as in Fig. 1. 

a, A large rope consisting of several components 
sits between the two spots and is seen to have 
accumulated a large amount of twist (about 2.257). 
The hyperbolic nature of the rope (field lines 
bifurcating with an X-type topology) is detailed in 
Extended Data Fig. 2. b, Good agreement of the 
shape of some computed field lines with X-ray 
data from Hinode/XRT. c, Ha data from the 
spectroheliograph at the Paris-Meudon 
Observatory reveals that a filament (darker) 
extends in the atmosphere between the two spots. 
d, The filament shown in c coincides with the 
locations of the dips in the computed magnetic 
field (shown as black segments and seen from the 
same vantage point as in c) where cool material can 
sit and be supported against gravity by the 
magnetic force. 
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Figure 3 | Accumulation of magnetic energy. a, Evolution of the free 
magnetic energy (black curve) during the four days before the major eruption. 
The actual energy of the major eruption lies in the red zone, defined by lower 
and upper limits estimated from the observations. b, Magnetic energy Wg of 
the configuration (black curve) and theoretical magnetic energy upper bound 
W.o, beyond which equilibrium is no longer possible (red curve), in units of 
the magnetic energy W,, of the (minimum-energy) current-free magnetic 
configuration having the same B, distribution on the photosphere. During the 
last day (D—1), the magnetic energy of the configuration comes closer to this 
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bound. c, Normal component of the magnetic field at D—1. The coloured 
crosses indicate a set of points selected along the photospheric projection of 
the twisted rope. d, Variations with altitude of the index n characterizing the 
decay of the horizontal component of the current-free magnetic field By, 
above each of the coloured points in c. The red dashed line corresponds to 
the critical value n, ~ 3/2, beyond which the rope is ejected according to the 
torus instability criterion. e, Variations in n for a subset of points lying 
inside the red rectangle in c, which is located on the left side of the 

southern spot. 
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There is then an agreement between our energetic onset criterion and 
the torus instability one. Moreover, n is closer to n, in the region near 
the left side of the southern spot, which might explain the asymmetric 
ejection of the rope whose eruption is stronger above that region (Fig. 3e). 
Finally, during the MHD phase of the rising of the rope, most of it even- 
tually reaches a height at which n = n,. It has been suggested” that the 
onset of an eruption may be also characterized in terms of a flux crite- 
rion, with the rope being ejected once the ratio of its axial flux to the flux 
through the main part of the active region becomes too large. We find 
that this parameter is equal to 15.3% at D—1. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Figure 4 | Evolution and eruption 
of the twisted flux rope. Selected 
field lines of some configurations 
obtained after an evolution driven by 
photospheric changes, with the 
configuration of December 12, 
20:30 uT being taken as the initial 
condition. a, In accordance with 
Fig. 3b, at around 20:30 uT the 
configuration is almost out of 
equilibrium and the rope rapidly 
rises. b, Sometime later, it erupts 
through the arcades that were able to 
confine it several hours before. 

c, Observed typical shape exhibited 
by eruptions on the solar limb (image 
taken at a wavelength of 304 A by 
SDO/AIA on 2012 October 14, at 
2:32 UT; courtesy of NASA/SDO and 
the AIA, EVE and HMI science 
teams). d, Modelled appearance of 
the major eruption of December 13, 
projected onto the solar limb, 
showing agreement with the typical 
shape shown in c (for convenience, 
the arcades have been removed). 
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METHODS 


The basic strategy that we have adopted is the following: first, we model the pre- 
eruptive phase through a series of reconstructions of the field (assumed to be in force- 
free equilibrium), with the needed boundary conditions being computed from the 
observational data. Second, we predict the evolution of the field during the eruptive 
phase by using the last obtained equilibrium as the initial condition ofa simulation 
done with a full MHD code. The physical results we get depend on both the quality 
of the data used and the mathematical efficiency of the method. 

Magnetic environment model at equilibrium. The magnetic field B can be mea- 
sured only in the photosphere, but we need a coronal value. During the occurrence 
of an eruptive event, B and the highly conducting, low-f plasma in which it is em- 
bedded evolve very slowly. B can thus be considered as being at each time f in an 
equilibrium force-free state in which it obeys the set of equations”! 


Vx B=a(r)B (1) 
V+B=0 (2) 
B-Vo(r)=0 (3) 


where «(r) is some scalar function of position r. Equation (3) (which is a conse- 
quence of equations (1) and (2)) implies that « keeps a constant value along any 
field line. One is thus led to set up the ‘reconstruction problem’: to determine in the 
part Q of the corona above an active region a magnetic field B satisfying as closely 
as possible the following conditions: (1) it is a finite-energy, force-free magnetic 
field in Q and (2) it matches on the lower boundary S of @ the value BP" provided 
by the measurements done at the photospheric level*". The reason why we added the 
restrictive qualification “as closely as possible” is that there is generally no field B sat- 
isfying conditions (1) and (2) simultaneously. The raw data can be ‘preprocessed’** 
to try to diminish the incompatibility between both requirements, but it cannot be 
totally suppressed. The best one can dois to set up a resolution scheme which is well- 
posed, in the sense that it leads to a unique solution that is stable with respect to 
small changes in the data. 

Several methods*' have been developed to treat the reconstruction problem, each 
one using the data in a specific way and defining a particular environment model, 
and it is important when one studies a particular active region to compare the re- 
sults furnished by various approaches'*. One should also compare these results with 
characteristic observational features of the region (like the presence of a filament 
at some particular location). 

To reconstruct the coronal field, we use a Grad—Rubin type method that solves a 
mixed hyperbolic-elliptic boundary-value problem for the force-free function « and 
for the field B (ref. 27). This method is based on rigorous mathematical grounds and 
has been proven to be well posed”’. As for the boundary conditions, one has to fix 
the value g of the normal component of the field on the whole boundary S and the 
value / of « on either the positive-polarity part ST of S (where B, > 0) or on the 
negative-polarity part S’ (where B, < 0). The XTRAPOL code” solves this prob- 
lem by means of the iterative Grad-Rubin scheme: 


B® .Vo=0 inQ (4) 
&—2 onS* or S~ (5) 
Vx Be =¢%B in Q (6) 
V-BEtY=0 inQ (7) 
BeY=¢g ons (8) 


The initial field B is chosen to be the potential magnetic field B, with a normal 
component equal to g on S (by definition B, is current free, that is, V x B, =0). 
XTRAPOL uses a finite-difference method with a representation of B based on a 
vector potential A (with a convenient choice of gauge), which ensures that V* B=0 
to the accuracy of round-off errors. The elliptic part (equations (6)-(8)) for Baty 
is solved through a positive-definite linear system, and the hyperbolic part (equa- 
tions (4)—(5)) for ot is solved by transporting the values 2 imposed on either S* or 
S~ along the magnetic field lines of B“. The code uses the message passive inter- 
face library. It provides a solution even if the photospheric flux is not balanced. 


Magnetic data and boundary conditions. The boundary conditions needed by 
our method, gand A, are extracted from the photospheric vector magnetograms ob- 
tained from the scans of the spectropolarimeter of the Solar Optical Telescope™* on 
board the satellite Hinode'’. More precisely, we used level-2 data (available online 
at http://sot.lmsal.com/data/sot/level2d/) obtained by applying the MERLIN inver- 
sion code* to the Stokes parameters I, Q, U and V measured by this instrument. 
The well-known 180° ambiguity suffered by the transverse component of the mag- 
netic field is resolved by using the minimum-energy code (MEO) based on a recent 
re-implementation**”’ of the original method”. By using standard transformation 
formulae”, we have also converted the scales and the magnetic components from 
the observer frame to the Cartesian frame tangent to the solar surface and centred 
on the middle point of the vector magnetograms. We have chosen a sequence of four 
vector magnetograms covering the period from December 9, 10:00 ut, to December 
12, 20:30 uT (the last available data before the eruption). Note that the raw data 
provided by the instrument are neither smoothed nor preprocessed. 

The value of g is directly furnished by the data. That of 2 is computed from the 
three components of the measured magnetic field BP"* according to” 


Am je (Vx Bphoty, 
pebet 


Cc Bphet 


where phot is the vertical component of the current density and c is the speed of 


light. To prevent unreliable values of 2 near the polarity inversion line where Bh 
is small, / is set to zero if |B,?"°'| is below a particular value, B,“". A similar cut-off 
is used for the intensity of the tangential component B??' of B. This avoids un- 
reliable values of fpnet due to sudden variations of B,?>* below the noise level. It 
should be noted that some improvements leading to non-zero values of 4 closer to 
the polarity inversion line could be obtained by applying further smoothing and 
interpolation on the computational mesh. 

Magnetic environment properties. Using these boundary conditions, we compute 
the solutions of our reconstruction problem with a numerical resolution of 501 X 
331 X 201. The quality of the results is evaluated by calculating the standard a- 
posteriori diagnostics, which are found to be much better than those characterizing 
previous reconstructions'**"*!”, In particular, we obtain much smaller values for 
the angle between the magnetic field and the electric current density (3.73° versus 
14.48° (model'* W,,"), these values being computed from the actual diagnostic 
CWsin, which deals with the sine of this angle) and for the functional Ly measuring 
the distance to a true equilibrium (0.07 G? Mm? versus 2.27 G* Mm * in ref. 19). 
Owing to our specific discretization alluded to above, the difference is even larger for 
the functionals measuring the residual divergence of the computed magnetic field. 
We obtain Lg = 10 *°G? Mm versus Lg = 1.15 G? Mm _”, from ref. 19, where the 
reconstructions were made by the optimization method and the constraint V » B=0 
was not imposed a priori, the field being made only as ‘divergenceless’ as possible. 
Also, we obtain (|f;|) = 2.7 10!” versus 3.6 X 10” ® for the best model!® Wop > 
where a Grad-Rubin scheme is used, but with a discretization leading to a non-zero 
resolution dependent value of V « B at each node. As another test, we have consid- 
ered the difference between the measured values of « at the two ends of any magnetic 
line, which in principle should vanish if the boundary data are strictly compatible 
with the existence of a force-free field in Q. We find that this quantity is relatively 
small for the field we compute despite the fact that our well-posed formulation uses 
only the values 2 of x in the positive-polarity region. It in fact takes relatively large 
values for the reconstructions based on the use of all three components of the mag- 
netic field on the whole active region. 

Wealso calculate the free magnetic energy A W (that is, the difference between the 
magnetic energy of the solution and that of the associated current-free field B,) 
stored in the equilibria. This is a very important quantity to be compared with the 
energy released during the following X-class eruption. For our last computed pre- 
flare configuration, we find AW = 7 X 10°? erg, which is above the upper limit of 
the estimate ((1.4-4.5) X 10° erg) of the energy released during the flare”®. The good 
observational compatibility of this result can be taken as another indication of the 
quality of our reconstructions (in particular, we recall that having a low value of La 
is well known to be necessary for obtaining an accurate value of the energy*’). Note 
that the energies previously found!" (AW = 5 X 10°” and 1.3 X 10°” erg) are also 
compatible with the energy release estimates, with the second value’ being close to 
the lower limit. 

As noted above, it is important to reconstruct the coronal field by different meth- 
ods and to check for the coherence of the results. We have also done reconstruc- 
tions using a newly developed numerical code, MESHMHD”, which solves the 
equilibrium equations (4)-(8) in the whole spherical corona. It is an adaptive un- 
structured (tetrahedral) mesh code, which includes in particular a different scheme 
for computing «. This scheme solves a linear non-symmetric system using the GMRES 
algorithm“. The boundary conditions are provided by composite photospheric data 
from three instruments: a Hinode/SOT vector magnetogram at the active region 
scale, a SOHO/MDI full-disk longitudinal magnetogram and a SOLIS synoptic 
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map with a latitude-longitude resolution of 1,368 X 2,824. As it is based on an un- 
structured mesh, MESHMHD can introduce high resolution where it is needed in 
the region and around the flux rope, and lower resolution elsewhere. As a striking 
result, we found that the two codes detect the presence of a twisted flux rope and 
capture its bifurcation to different anchoring points on the east side (Extended Data 
Fig. 2). 

Prediction of the later evolution by using a dynamical MHD code. During the 
eruptive phase, the coronal magnetic field evolves according to its own internal 
dynamics and can then no longer be determined from its photospheric values. But 
we can use our dynamical MHD code METEOSOL”™, which solves the full system 
of MHD equations, to predict its evolution from the last computed pre-eruptive 
equilibrium state. We inject that state into the code as an initial condition and select 
boundary conditions able to describe the actual photospheric processes that force 
the field to evolve. In our previous theoretical studies***°, we identified three such 
processes and showed that each of them can trigger a flare and a CME: 

(1) Partial cancellation near the polarity inversion line“. 

(2) Turbulent diffusion**“*, which is an important process permanently acting 
on the whole surface of the Sun, where it leads to the continuous dispersion of active 
regions. It can be characterized by a coefficient of turbulent diffusion xg = 10° or 
10+. 

(3) Plasma motions that diverge from the two spots of a bipolar region and con- 
verge towards the polarity inversion line in its vicinity, into which they transport a 
part of the magnetic flux*. These ‘moat flows’, which have been observed in an active 
region*”“’, have their source just outside the penumbra, near the locus at which the 
Evershed flow disappears. 

We successively compute three MHD evolutions, each starting from the same 
initial state and driven by one of the processes above. The processes are introduced 
into the code by requiring the tangential component of the electric field on S to as- 
sume a specific form. 

The three types of boundary condition lead after a short interval to a major dis- 
ruption. This is similar to what we obtained in the corresponding theoretical models 
previously developed. There is, however, an important difference: here the initial 
presence and properties of the twisted rope are no longer postulated, but are fur- 
nished by our environment model, which determines to a very good approximation 
the actual initial structure of the field above the active region. Reconstruction has 
shown the presence ofa rope in an active region”, but this has been rare. A striking 
result of our modelling is that the eruption is due to a loss of equilibrium occurring 
when the energy of the configuration becomes of the order of the energy of the 
partially open field B,, defined below. 

The partially open field. To define B,,, we first construct an additional magnetic 
field, the open field B,, having the same normal component g on S as the evolving 
coronal field B. In contrast to the potential field B,, whose most magnetic lines are 
closed (each one connects two points of S), B, has all its field lines open (they con- 
nect S to the upper and lateral boundaries of the computational box). It is current 
free except on the surface separating outgoing lines from ingoing ones, across which 
it reverses. As with B,, B, is in equilibrium in the sense that it does not exert any 
force on the plasma in which it is embedded. We can also construct equilibrium 
fields, Bj, satisfying the same boundary condition and ‘interpolating’ between B, 
and B,. These fields are only partially open—they have both closed and open lines— 
and they are also current free except on a reversal surface. The field B,, is one of 
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these fields Bi. It is selected by requiring that its open lines connect to S in the re- 
gion where the electric currents are concentrated. This field was first introduced in 
an analytical theory”' that describes the evolution of Bas a sequence of equilibrium 
states, this quasi-static approximation being valid as long as the evolution is slow. It 
was argued on general grounds that W(B) = W(B,.) = W(B,) (with W(B) denot- 
ing the energy of B), and that B starts experiencing a fast expansion leading to its 
partial opening when W(B) approaches W(B,,). Such fast evolution implies a break- 
down of the quasi-static approximation and one needs to adopt the dynamical ap- 
proach described in the previous item to get a proper description of it. 
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Extended Data Figure 1 | Another multiscale model. Full-Sun magnetic 
configuration obtained using composite data set (Hinode/SOT and SOLIS 
synoptic map) and the state-of-the-art numerical code MESHMHD, which is a 
tetrahedral adaptive-mesh equilibrium code. Local and global scales are both 
accessible using very high-resolution data around the active region and 

lower resolution elsewhere. The twisted rope obtained with XTRAPOL is fully 
recovered. a, Global view showing the disk of tetrahedral-cell mesh and the 
spherical photosphere, where we have indicated the various resolutions for the 


northern hemisphere and, in transparency behind the disk, for the southern 
hemisphere. b, Zoom onto the active region showing the high resolution used 
around it. c, Closer look at the rope, with a cut showing how the adaptive 
scheme allows high mesh resolution in the regions where the coronal electric 
current and magnetic field are stronger. d, Another point of view, exhibiting 
the large extent of the rope, which is still confined by the overlying field 

lines (orange). 
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Extended Data Figure 2 | Hyperbolic flux tube. a, Breaking of the twisted J-shaped arcades whose central parts become tangential to each other. 

rope into various components to exhibit its hyperbolic nature, using the e, One of the J-shaped set of loops (green) above the sheared arcades (yellow), 
same colour code as in Fig. 1. b, Core of the rope, which is highly twisted (by | becoming tangential to each other near the neutral line. 

about 2.257). c, Underlying highly sheared arcades below the core. d, Two 
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Extended Data Figure 3 | Signature of the pre-eruptive current density of | the modulus of the current density, | j|. The large-scale structure of the twisted 
the reconstructed magnetic configuration of 12 December, 20:30 ur. We rope (and of small parts above) is well exhibited by this quantity « in a, in 
have plotted here two isosurfaces of the force-free function ~ measuring the agreement with Fig. 2b, whereas weaker electric currents (overlying) structures 
ratio of the electric current density to the magnetic field: a, «= —0.23Mm7'; are shown in b. 


b, « = —0.05 Mm '. These isosurfaces are coloured according to the values of 
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Extended Data Figure 4 | Extreme-ultraviolet emission and magnetic magnetic lines in the region of the twisted rope and in the regions of 
structure. Selected field lines of the reconstructed magnetic configuration of | approximately current-free loops, such as that located on the right-hand side of 
12 December, 20:30 UT, overlaid on an SOHO/EIT extreme-ultraviolet the rope. 


emission image taken at 23:49 uT. The emission is well correlated with the 
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Piezoelectricity of single-atomic-layer MoS, for 
energy conversion and piezotronics 


Wenzhuo Wu'*, Lei Wang*, Yilei Li*, Fan Zhang*, Long Lin’, Simiao Niu’, Daniel Chenet*, Xian Zhang*, Yufeng Hao’, 


Tony F. Heinz’, James Hone* & Zhong Lin Wang" 


The piezoelectric characteristics of nanowires, thin films and bulk 
crystals have been closely studied for potential applications in sensors, 
transducers, energy conversion and electronics’ *. With their high crys- 
tallinity and ability to withstand enormous strain* °, two-dimensional 
materials are of great interest as high-performance piezoelectric mate- 
rials. Monolayer MoS, is predicted to be strongly piezoelectric, an 
effect that disappears in the bulk owing to the opposite orientations 
of adjacent atomic layers”*. Here we report the first experimental 
study of the piezoelectric properties of two-dimensional MoS, and 
show that cyclic stretching and releasing of thin MoS, flakes with an 
odd number of atomic layers produces oscillating piezoelectric volt- 
age and current outputs, whereas no output is observed for flakes with 
an even number of layers. A single monolayer flake strained by 0.53% 
generates a peak output of 15 mV and 20 pA, corresponding to a power 
density of 2mW m~ and a 5.08% mechanical-to-electrical energy 
conversion efficiency. In agreement with theoretical predictions, the 
output increases with decreasing thickness and reverses sign when 
the strain direction is rotated by 90°. Transport measurements show 
a strong piezotronic effect in single-layer MoS), but not in bilayer 
and bulk MoS,. The coupling between piezoelectricity and semicon- 
ducting properties in two-dimensional nanomaterials may enable 
the development of applications in powering nanodevices, adaptive 
bioprobes and tunable/stretchable electronics/optoelectronics. 
Crystal structure and symmetry dictate the physical properties of a 
material and its interaction with external stimuli. Materials with polar- 
ization domains, such as Pb(Ti,Zr)O3, or with non-centrosymmetric struc- 
ture, such as ZnO and GaN, are piezoelectric and have wide applications 
in sensors, transducers, power generation and electronics’*”. Layered 
materials, such as graphite, hexagonal boron nitride (h-BN) and many 
transition-metal dichalcogenides, are centrosymmetric in their bulk 
three-dimensional form but may exhibit different symmetry when thin- 
ned down to a single atomic layer”’*"’. In graphene, inversion symmetry 
is preserved because both atoms in the unit cell are identical, whereas 
monolayer h-BN and transition-metal dichalcogenides become non- 
centrosymmetric because of the absence of an inversion centre, which 
leads to novel properties such as valley-selective circular dichroism'*”” 
and large second-order nonlinear susceptibility'*"*, corresponding to the 
optical second-harmonic generation (SHG) process’®. Single-atomic-layer 
h-BN, MoS, MoSez and WTe; have also been theoretically predicted 
to show piezoelectricity as a result of strain-induced lattice distortion 
and the associated ion charge polarization’*"', suggesting possible appli- 
cations of two-dimensional (2D) nanomaterials in nanoscale electrome- 
chanical devices that take advantage of their outstanding semiconducting 
and mechanical properties* °’’. Here we report an experimental obser- 
vation of piezoelectricity in single-atomic-layer 2D MoS, and its applica- 
tion in mechanical energy harvesting and piezotronic sensing. Cyclic 
stretching and releasing of odd-layer MoS, flakes produces oscillating 
electrical outputs, which converts mechanical energy into electricity. 


The strain-induced polarization charges in single-layer MoS, can also 
modulate charge carrier transport at the MoS,—metal barrier and enable 
enhanced strain sensing. In addition, we have also observed large piezo- 
resistivity in even-layer MoS, with a gauge factor of about 230 for the 
bilayer material, which indicates a possible strain-induced change in band 
structure’. Our study demonstrates the potential of 2D nanomaterials 
in powering nanodevices, adaptive bioprobes and tunable/stretchable 
electronics/optoelectronics. 

In our experiments, MoS, flakes were mechanically exfoliated onto 
a polymer stack consisting of water-soluble polyvinyl alcohol and poly 
(methyl methacrylate) on a Si substrate, with the total polymer thickness 
tuned to be 275 nm for good optical contrast. Few-layer MoS, flakes were 
first identified under an optical microscope. The layer thickness was then 
measured by atomic force microscopy (Fig. la, inset) and confirmed by 
Raman spectroscopy (Extended Data Fig. 1a). SHG was subsequently 
used to determine the crystallographic orientation of the MoS, flakes 
(Methods and Extended Data Fig. 1b). Figure 1b shows a polar plot of 
the second-harmonic (SH) signal intensity from a single-layer MoS, 
flake as a function of the crystal’s azimuthal angle. Here we measured 
the SH component perpendicular to the excitation polarization. The lat- 
tice orientation was determined by fitting the angle dependence of SH 
intensity with I = Ip sin’ (30), where 0 denotes the angle between the 
direction of the ‘armchair’ edge and the polarization of the excitation 
laser, and Ip is the maximum intensity of the SH response. Figure la 
schematically depicts the derived lattice orientation superimposed on 
the optical image of a single-layer MoS, flake; the x axis is taken to be 
along the ‘armchair’ direction, and the y axis along the ‘zigzag’ direction. 
After optical characterization, flakes were subsequently transferred to 
a polyethylene terephthalate (PET) flexible substrate using methods 
described previously”. Electrical contacts made of Cr/Pd/Au (1 nm/20 nm/ 
50 nm) were deposited with the metal-MoS, interface parallel to the 
y axis. Figure 1c shows a typical flexible device with the single-layer 
MoS; flake outlined by black dashed line. When the substrate was bent 
mechanically, uniaxial strain is applied to the MoS, with a magnitude 
proportional to the inverse bending radius (Fig. 1d and Methods). The 
applied strain was limited to 0.8% to avoid sample slippage’’. We studied 
the piezoelectric response by applying strain to a device coupled to an 
external load resistor (Fig. 1d). In this configuration, strain-induced 
polarization charges at the sample edges can drive the flow of electrons 
in external circuit®. When the substrate is released, electrons flow back 
in the opposite direction. 

Figure 2a and Extended Data Fig. 3 show the piezoelectric current 
and voltage responses of a single-layer MoS. When strain is applied in 
the x direction (‘armchair’ direction), positive voltage and current out- 
put were observed with increasing strain, and negative output was ob- 
served with decreasing strain, directly demonstrating the conversion of 
mechanical energy into electricity’ (see also Extended Data Fig. 4 and 
Methods). Both responses increased with the magnitude of the applied 
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strain: for single-layer MoS, with width of ~5 1m and a length of 
~10 pm, the peak open-circuit voltage reached 18 mV and the peak 
short-circuit current reached 27 pA (Fig. 2b), with voltage and current 
responsivities of 55.1 + 12.3 pA and 32.8 + 4.5 mV, respectively, for 
each 1% change in strain. There were no significant electrical outputs 
from bare PET substrates without a single-layer MoS, flake (Extended 
Data Fig. 5b). 

The dependence of piezoelectric charge polarization on the direc- 
tions of principal strain in 2D materials was also investigated. The coup- 
ling between polarization (P;) and strain (¢,) tensor can be quantified 
to first order by the third-rank piezoelectric tensor ej, = (OP; / O& jx) 
where i, j, ke (1, 2,3), with 1, 2 and 3 corresponding to the x, y and z 
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Figure 2 | Piezoelectric outputs from single-layer and multi-layer MoS, 
devices. a, Voltage response with 1 GQ external load and short-circuit current 
response of a single-layer MoS, device under periodic strain in two different 
principal directions. Top: applied strain as a function of time. Middle: 
corresponding piezoelectric outputs from single-atomic-layer MoS, when 
strain is applied in the x direction (armchair direction). Bottom: corresponding 
piezoelectric outputs from the same device when strain is applied in the y 
direction (zigzag direction). The phase difference highlighted by black dashed 
lines is obtained by theoretical derivation and has been intentionally 
exaggerated for clarity, not experimental measurement. Red, blue and black 
arrows represent the directions of polarization, the polar axis of MoS, and 
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Figure 1 | Single-layer MoS, piezoelectric device 
and operation scheme. a, Optical image of the 
single-atomic layer MoS, flake with superimposed 
lattice orientation derived from SHG results. Blue 
and yellow spheres represent Mo and S atoms, 
respectively. Inset: atomic force microscopy image 
of the flake. Scale bar 2 um. b, Polar plot of the SH 
intensity from single-layer MoS, as a function of 
the crystal’s azimuthal angle 0. The symbols are 
experimental data and the solid lines are fits to the 
symmetry analysis described in the text. c, A typical 
flexible device with single-layer MoS, flake and 
electrodes at its zigzag edges. Inset: optical image 
of the flexible device. d, Operation scheme of the 
single-layer MoS, piezoelectric device. When the 
device is stretched, piezoelectric polarization 
charges of opposite polarity (plus and minus 
symbols) are induced at the zigzag edges of the 
MoS, flake. Periodic stretching and releasing of 
the substrate can generate piezoelectric outputs 

in external circuits with alternating polarity 


Released (as indicated by the red arrows). 


axes, respectively. Symmetry analysis of the D3, point group suggested 
that there was only one non-zero independent coefficient e,; for single- 
layer MoS,, The in-plane polarization along the x axis, sensed by the metal 
electrodes as shown in Fig. 1d, can be expressed as Py = e11(€11-€22); 
whereas P, along the y axis is related to the pure shear strain ¢,, and can 
be ignored in these experiments. A distinctive consequence of this sym- 
metry is that the output is expected to reverse sign when the strain is 
rotated from the x (‘armchair’) to the y (‘zigzag’) direction. This was 
verified experimentally, as shown in the bottom panel of Fig. 2a. 

To quantify the power output of the piezoelectric circuit, it is necessary 
to study the voltage and current outputs as a function of load resistance, 
as shown in Fig. 2c (see Extended Data Fig. 6 for circuit details). The 
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principal strains, respectively. b, Dependence of piezoelectric outputs from a 
single-layer MoS, device on the magnitude of the applied strain. Mean values 
from 20 technical replicates are indicated. Error bars represent s.d. 

c, Dependence of voltage and current outputs from a single-layer MoS, device 
under 0.53% strain as a function of load resistance. Mean values from 20 
technical replicates are indicated. Error bars represent s.d. d, Cyclic test 
showing the stability of single-layer MoS, device for prolonged period. 

e, Evolution of the piezoelectric outputs with increasing number of atomic 
layers (n) in MoS, flakes. For each device, mean values from 20 technical 
replicates are indicated. Error bars represent s.d. 
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output current was constant for a load resistance of up to ~ 1 MQ and 
then decreased with increasing load, whereas the output voltage was 
~0 V and began to increase at the same point. The maximum instant- 
aneous power delivered to the load at 0.53% strain was achieved for a 
load resistance of ~220 MQ and reached 55.3 fW (5.53 X 10“ W), with 
a corresponding power density of 2mW m ? (Extended Data Fig. 6). 
The conversion efficiency of the single-layer MoS, nanogenerator, which 
is the ratio of the electric power delivered to the load to the total mech- 
anical deformation energy stored in the single-layer MoS, after being 
strained, can therefore be estimated as ~5.08% (Methods). This energy 
conversion was stable over time, as shown for cyclic loading up to 0.43% 
strain at 0.5 Hz for 300 min (Fig. 2d and Extended Data Fig. 7). The 
observed slight decrease in output may have been caused by mechanical 
fatigue of the flexible substrate’. 

We next examined the evolution of the piezoelectric signal with an 
increasing number of atomic layers (1). As discussed above, because of 
the opposite orientation of alternating layers in the most common (2H) 
form of MoS, flakes with n = even are expected to be centrosymmetric 
and thus lose both their piezo response and SHG signal*"®. For the same 
reason, in samples with an odd number of layers, the piezo response and 
SHG should return. Figure 2e shows the measured piezoelectric output 
for MoS, flakes with n = 1, 2, 3, 4, 5 and 6, and for a bulk MoS, flake 
with a thickness of more than 100 nm. Consistent with the above pic- 
ture, the SHG intensity was strong for n = 1, 3, 5 and disappeared for 
n = 2, 4, 6 and the bulk (Extended Data Fig. 8), consistent with pre- 
vious reports’®. To measure the piezoelectric response, the source and 
drain electrodes were made at the zigzag edge for the odd-layer flakes 
and at an arbitrary angle for the even-layer flakes, because of the absence 
of a SHG signal. Almost no detectable output can be seen for bulk flake 
and even-layer samples. For odd-layer samples the piezoelectric output 
is large and decreases roughly as the inverse of n. These results confirm 
that single-layer MoS, with broken inversion symmetry has a strong 
intrinsic piezoelectric response, whereas centrosymmetric bilayers and 
bulk crystals are non-piezoelectric”®. 

We next characterized the changes in direct-current electrical trans- 
port properties of the devices with strain, in a two-terminal configura- 
tion with the polarity of the applied voltage defined with respect to the 
drain electrode. The metal-semiconductor—metal (MSM) device as fab- 
ricated consisted of two back-to-back Schottky barriers, and transport 
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across the reverse-biased Pd-MoS, Schottky barrier limited the cur- 
rent flow”’. In this configuration, changes in transport behaviour arose 
largely from two effects: the piezotronic effect”, in which strain-induced 
charge asymmetrically modulated the Schottky barriers, and a piezo- 
resistive effect, in which strain-induced bandgap change modulated the 
entire resistance of the device. For the single-layer device, the current- 
voltage curve shifted leftwards (towards negative drain bias) under tens- 
ile strain, and rightwards with compressive strain (Fig. 3a). The opposite 
trend was observed under negative drain bias. This asymmetric modulation 
was similar to the piezotronic effect reported for piezoelectric semicon- 
ductors with wurtzite and zincblende structures, in which the modulation 
of carrier transport arises as a result of piezoelectric polarization in the 
crystal (Methods). Here, piezoelectric polarization charges at the zig- 
zag edges were able to affect the metal-MoS, contacts directly (Fig. 3c) 
by modifying the concentration or distribution of free carriers in MoS 
as well as by modulating the electronic charges in interface states’, such 
that the mechanical strain functioned as a controlling gate signal. Two 
points should be noted here. First, MoS, contacted Pd electrodes in two 
ways in our experiment: at its zigzag end edges, where the piezoelectric 
polarization charges were distributed, and at the top surface, which may 
not have had piezoelectric charges. Although the polarization charges 
were induced at the zigzag edge interface, they were still able to affect 
the electrical transport across the whole Schottky contact area formed 
between MoS, and metal. This was probably due to the fact that the 
majority electric field under bias was distributed at the end-edge contacts, 
in an analogous manner to previous studies on piezotronic effect in ZnO 
nanowires, in which the electric field was focused at the end surface 
although the contact also occurred at the side surfaces”*. Second, the free 
charge carriers in monolayer MoS, were not taken into consideration 
for simplicity in the band diagrams (Fig. 3c). In reality, the finite carrier 
density in MoS, (which depended on factors such as intrinsic and envir- 
onmental doping and varied from flake to flake) may have resulted in 
partial screening of the strain-induced polarization charges and hence 
could have affected its piezoelectric performance (Methods), but it is 
still possible to observe its piezoelectric power output and piezotronic 
transport characteristics. The results in Fig. 3a can be used to determine 
the crystallographic orientation of the flake uniquely as having the S-edge 
and Mo-edge at the drain and source electrodes, respectively. Controllable 
modulation of M-S contacts or p-n junctions in 2D nanomaterials by 


Figure 3 | Direct-current electrical 
characterizations of single-layer and bilayer 
MoS; devices under strains. a, The asymmetric 
modulation of carrier transport by strains under 


opposite drain bias in a single-layer device shows 
characteristics of a piezotronic effect. b, The 
symmetrical modulation of carrier transport by 
strains under opposite drain bias in a bilayer device 
shows characteristics of a piezoresistive effect. 

c, Band diagrams explaining the piezotronic 
behaviour observed in a single-layer device as a 
result of the changes in Schottky barrier heights by 
strain-induced polarization. #4 and ¢, represent 
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the Schottky barrier heights formed at drain and 
source contacts, respectively. E, indicates the 
change in Schottky barrier height by piezoelectric 
polarization charges. 
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Figure 4 | Array integration of CVD single-layer MoS; flakes. a, Optical 
image of an array consisting of four CVD single-layer MoS, flakes. 
b, Constructive voltage outputs by serial connection of the individual flakes in 


the circuit. c, Constructive current outputs by parallel connection of the 
individual flakes in the circuit. 


strain-induced polarization may offer a novel approach unavailable to 
conventional technologies using electrical control signals, without modi- 
fying the interface structure or chemistry, for implementing tunable 
electronics/optoelectronics, enhanced photovoltaics, hybrid spintro- 
nics and catalysis**”*. 

In bilayer and bulk MoS, devices, the response is purely piezoresis- 
tive: the current increases symmetrically with applied strain, and the 
gap region in the J-V curve shrinks, consistent with a lowering of both 
source and drain Schottky barriers as a result of a decrease in the band- 
gap'® and/or a change in carrier density (see Fig. 3b for a bilayer device 
and Extended Data Fig. 9 for a bulk device). The gauge factors ([AI(«)/ 
1(0)]/Ae) of the bilayer (~230) and bulk MoS, devices (~200) measured 
in our experiments were comparable to that reported for a state-of-the- 
art silicon strain sensor (~200), and exceeded the values of conventional 
metal strain gauges (~ 1-5) anda graphene strain sensor (~72) (ref. 26). 
This is a preliminary demonstration of large piezoresistivity in 2D MoS, 
and motivates further study for using it in highly sensitive strain sensing. 

Finally, we demonstrate the array integration of single-layer MoS, 
flakes to boost the piezoelectric output for energy conversion (Fig. 4). 
High-quality single-layer MoS, crystals were grown by seed-free chemical 
vapour deposition (CVD) ona Si/SiO, substrate, using a method reported 
previously”’. Single-domain triangles were then transferred from the 
growth substrate to flexible PET’. Previous studies noted that molyb- 
denum zigzag and sulphur zigzag are the two dominant morphologies 
of CVD MoS, triangles”*, and molybdenum zigzag triangles consistently 
have sharper and straighter edges than sulphur zigzag triangles’. This 
morphological difference allowed us to easily identify the crystal orien- 
tation of CVD MoS, in optical images and pattern the electrodes accord- 
ingly (Fig. 4a). For this demonstration, four CVD MoS, flakes were 
chosen. By constructively connecting the four flakes either in series or 
in parallel, consistent enhancements in output voltages or currents were 
observed (Fig. 4b, c). Moreover, by destructively connecting devices 1 
and 2 either in series or in parallel, the combined outputs were the 
difference of the two individual outputs (Extended Data Fig. 10). This 
may open up possibilities of achieving practical technology at an even 
larger scale with 2D piezoelectric nanomaterials for powering nano- 
devices, tactile imaging and wearable electronics’”’. Nevertheless, efforts 
are required to achieve a better understanding of the synthesis of CVD 
MoS, with more controllable properties and optimize large-scale fab- 
rication for improving device-to-device uniformity and achieving practical 
applications based on an array of single-atomic-layer MoS, devices. 
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The piezoelectricity and large mechanical stretchablity and flexibil- 
ity of single-atomic-layer MoS, demonstrate its potential applications 
in electromechanical sensing, wearable technology, pervasive comput- 
ing and implanted devices. The integration of a MoS,-based power 
source with graphene and other functional units or devices based on 
2D materials for energy storage, sensing, logic computation and com- 
munication on the same substrate may permit the construction of an 
atomic-thin self-powered nanosystem that can operate self-sustainably 
without external bias by harvesting energy from the ambient envir- 
onment*’, especially in circumstances in which other energy sources, 
such as solar or thermal energy, are not readily available. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Determination of crystallographic orientation in MoS, flakes by optical SHG. 
The SHG measurements were performed in reflection geometry (Extended Data 
Fig. 1b). The incident excitation beam was normal to the sample. The pump radi- 
ation was supplied by a mode-locked Ti:sapphire oscillator operating at a repetition 
rate of 80 MHz. The pulses were of 90 fs duration and centred on a wavelength of 
810 nm. Using a X 100 objective, we focused the pump beam on the sample with a 
spot size of about 1 um. We limited the average laser power to 1 mW. The retro- 
reflected SH signal was collected by the same objective, separated by a beam splitter 
and filtered by a short-pass optical filter (cutoff at 785 nm) to block the reflected 
fundamental radiation. An analyser was used to select the polarization component 
of the SH radiation perpendicular to the polarization of the pump beam. After 
dispersal in a spectrometer, the SH signal was detected by a liquid-nitrogen-cooled 
charge-coupled device camera. The SH character of the detected radiation was 
verified by its wavelength and quadratic power dependence on the pump intensity. 
In our setup we could freely rotate the samples to obtain the orientation depend- 
ence of the SH response. However, it should be noted that the lattice orientation 
cannot be uniquely determined by SHG because the SH signal remains the same if 
the lattice plane is rotated by 180° with respect to its normal direction. In other 
words, only the direction of the Mo-S bond axis can be obtained from the SHG 
signal, and no conclusion can be drawn about which side is Mo-edge or S-edge. 
Electrical output measurements. A programmable electrometer (part number 
6514, Keithley) with 200 TQ input impedance was used for measuring voltage signals 
from the device. A low-noise current preamplifier (part number SR570; Stanford 
Research Systems, Inc.) was used for current measurements, with its direct-current 
input impedance at 1 MQ for the sensitivity level used (10-1? and 10° ‘7 AV). 
A 10-Hz low-pass filter was used for both voltage and current measurements. A 
computer-controlled measurement software written in Labview was used to collect 
and record the data. All measurements were made inside a home-built electrical 
cage. A linear motor (LinMot PS01-23 X 80) was used for applying programmed 
driving strain inputs. 

Estimation of strain induced in MoS, device. Because the dimensions of MoS, 
flakes (5 j1m X 5 um X (0.6-100 nm)) are significantly smaller than those of the 
PET substrates (2.5 cm X 2 cm X 500 pum), the mechanical behaviour of the substrate 
and entire device is not affected by the MoS, flake. Consequently, the Saint- Venant 
theory for small bending deformation can be adopted for estimating the induced 
strain in MoS, devices*'. The PET substrate can be approximated as a beam struc- 
ture of thickness a, width w and length /. The origin for calculation is defined as the 
centre of the fixed edge. The x and z axis are along the width (w) and length (J) 
directions of the substrate, respectively (Extended Data Fig. 2a). Because the MoS, 
flake lies above the neutral strain axis of the entire device, the deflection of the sub- 
strate under an external force f,, exerted by the linear motor, results in a pure elonga- 
tion or contraction in the MoS, flake if no slippage is considered. It follows that the 
uniaxial in-plane strain induced in MoS, can be estimated by the ¢,, component of 
strain in the PET substrate. Applying the Saint-Venant theory for small deflections 
of the beam, o,z (fy/Ix)y(I—2), Oxx = Tyy =0, in which I, is the moment of 
inertia for the beam. Consequently, ¢,, = 0,,/E, where E is the Young’s modulus of 
substrate. Experimentally, it is more convenient to measure the lateral deflection 
Dmax of the free end of substrate rather than the external point force f,. Since 
Dir fe /3ElIxx by classical beam mechanics, then 


Because the MoS, flake is lying at y= +.a/2 and z= Zp, where Z is the distance 
between the fixed edge and the MoS, flake, which can be measured in the experi- 
ment, the strain induced in MoS, flake can be estimated by 
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The negative sign is for compressive strain and positive sign is for tensile strain. All 
variables in the above equation can be readily measured experimentally. In the 
experiment, the linear motor used for inducing strain is first accelerated and then 
decelerated with a constant acceleration a. The maximum lateral deflection D,,., of 
the free end of the substrate, the acceleration a and the hold time (t, and t)) are 
known parameters and can be controlled by the linear motor (see Extended Data 
Fig. 2b, c). Therefore the driving signal (moving distance d) in one cycle can be 
mathematically described by the following equations: 


d=at? /2, (t<VDmax/a) 
es eae 54(2VDnasla—t) , (\/Dinax/4<t<2\/Dinax/a) 
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d=Dyax,(2 V Dymax/a<t<2,/ Dmax/a+ ti) 
1 2 
d=Dmax — 5a(t—-2 Drnax/4—t:) (2 Dax /a+t <t<3\/Dmax/a+t) 


1 2 
d= 5a(t—4 Dinax/4—th) (3 Dinax /A+t) <t<4\/Dmax/a+ty) 
d=0,(4,\/Dmax/at th <t<4.\/Dmax/a+t + tr) 


A representative plot is shown in Fig. 2a and Extended Data Fig. 2b. As a result of 
the large difference between scales of a (5-10 m s 7)and Dinax (<10 mm), the stretch 
and release edges of the strain curve are very sharp. Snapshots from the typical 
configurations of the linear motor are also included in Extended Data Fig. 2c. 
Power generation by piezoelectric polarization charges in single-layer MoS). 
When single-layer MoS, is subjected to tensile strain, effective piezoelectric charges 
are induced at the armchair edges as a result of the polarization of atoms in the 
strained crystal (Extended Data Fig. 4b). The negative polarization charges deplete 
the barrier interface and drive the electron flow from the left electrode to the right 
electrode through an external load (Extended Data Fig. 4b), giving rise to the current 
peak labelled ‘b’ in Extended Data Fig. 4. The resistance of the Schottky barrier is 
significantly high for voltages below a threshold value and thus blocks the flow of 
electrons through the wire (Extended Data Fig. 4b). The electrons accumulate at 
the interfacial region between the right electrode and the MoS,; the effect of pie- 
zoelectric polarization charges is balanced by the accumulated electrons and the 
Fermi levels in the entire system reach a new equilibrium (Extended Data Fig. 4c). 
When the tensile strain in the MoS, is released, the piezoelectric polarization charges 
vanish immediately and the electrons previously accumulated at the right electrode 
flow back to the left electrode through the external load to return the system to the 
original state, resulting in the downward current peak labelled ‘d’ in Extended Data 
Fig. 4. The above process therefore performs one cycle of energy harvesting and 
conversion from the mechanical to the electrical domain by single-layer MoS. 

The piezoelectric voltage constant gis the electric field generated by a piezoelectric 
material per unit of mechanical stress applied or, alternatively, is the mechanical 
strain experienced by a piezoelectric material per unit of electric displacement applied. 
The first subscript to g indicates the direction of the electric field generated in the 
material, or the direction of the applied electric displacement. The second subscript 
is the direction of the applied stress or the induced strain, respectively. The strength 
of the induced electric field produced by a piezoelectric material in response to an 
applied physical stress is the product of the value for the applied stress and the value 
for g. For simplicity we can assume that the piezoelectric material is an insulator 
with no internal screening of the strain-induced polarization charges and that the 
Schottky barrier between the piezoelectric material and the metal electrode can 
fully block the electron injection from metal electrode back to the piezoelectric 
material. Therefore, if we apply a strain ¢ to the piezoelectric material, the internal 
piezoelectric field generated inside the piezoelectric material is given by E=geY, 
where g is piezoelectric voltage constant and Y is the Young’s modulus of the pie- 
zoelectric material. Thus, the internal piezoelectric potential across the piezoelectric 
material can be given by Vpiezo = Eh=geYh, where h is the length of the material. 
Thus, at open-circuit condition, the voltage between the two electrodes is this internal 
piezoelectric potential. Therefore Voc = Vpiew = Eh=geYh. 

We assume that the piezoelectric material is insulating and that the Schottky 
barrier between the piezoelectric material and the metal electrode can fully block 
the electron injection from the metal electrode back to the piezoelectric material. 
Thus, when the two electrodes are connected by an external load, the electrons will 
be driven to flow from one electrode to the other. These transferred electrons will 
generate another electric field, which will screen the original piezoelectric field 
generated by the strain. If the transferred charge amount from one electrode to 
the other electrode is Q, the voltage generated by those charges can be shown to be 
Q/Co, where Cy is the capacitance between the two electrodes, which can be roughly 
estimated as a constant. Under short-circuit conditions, the total potential drop 
between the two electrodes is 0. Thus, we have Vpiexo = Qsc/Co. The short-circuit 
transferred charge can be given by Qsc = Co Vpiexo = Coge Yh. Therefore the short- 
circuit current can be shown to be 

dQsc de 


Isc = = Yh— 
sc= a, = Cog lh 


In our experiment, the equivalent resistance of the piezoelectric nanogenerator 
(mainly contributed by the reverse-biased Schottky barrier) is ~80 GQ. when the 
bias is close to zero (at ~20 mV), which can be obtained from Fig. 3a. Because the 
largest load resistance used in power output measurement (Fig. 2) is 2 GQ, we can 
consider that the internal resistance of the piezoelectric nanogenerator to be infinite, 
and the Schottky barrier between the piezoelectric material and the metal electrode 
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can fully block electron injection from the metal electrode back to the piezoelectric 
material. Therefore the piezoelectric nanogenerator can be assumed to be purely 
capacitive. When the nanogenerator is connected with a load resistor, the equi- 
valent circuit for the whole system is shown in Extended Data Fig. 4. 

We first analyse a simple case when the mechanical motion is a pure harmonic 
vibration: ¢ =A sin(cot). Thus, from the above analysis, Voie = géYh = gAYh sin (wt). 
And Co can be assumed constant, as discussed above. The whole system is a linear 
time-invariant system and a phase method can be used to solve the system. The 
following equations for the output voltage and current with a load resistance of R 
can be obtained: 


aC ; 
avez (@RCy +)) 


1 
YAYh=gAYh 
(Jj) +R® erage 


RCo 
+ w?R?2Cp? 


(@RCo +j) 


R 
V AYh=gAYh 
R* (jC) + R* cami 


Therefore 


Co 


Ip(t) =gAYh TreRG? (@RCp sin(@t) + cos(at)) 
@RCo 
1+@?R?Cy* 


The short-circuit current is then 


Va(t) =gA (@RCp sin(wt) + cos(@t)) 


Isc = gAYha@Cy cos(wt) 


Therefore the phase shift 0 between the short-circuit current and non-open-circuit 
voltage (finite external load) can be obtained as 0 = 90 — arctan (1/@RCp). There- 
fore, when R = 0 the phase shift between the output signal and the short-circuit 
current is 0, and when R = & the phase shift between the output signal and the short 
circuit current is 90°. For other cases, the phase difference is between 0° and 90°. 

When the applied strain is not pure harmonic, because the applied strain is still a 
periodic signal it can be presented as a Fourier series: 


ic.¢) 
&= -S A, sin(kot) + B, cos(kat) 
k= 
Because the system is a still linear time-invariant system, the output is a sum- 
mation of the output of each harmonic signal. From the similar analysis, we can 
still obtain a similar conclusion: 


Vpiewo =geYh=gYh (>: A, sin(ket) + By a) 


k=1 


ee k@Co 
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koC , 
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When R= 0, 
Isc(t) =gYhwCo SA k cos(ket) — Bk sin(kat)) 
k=1 
When R= co, 


Voc(t) =srh( > A, sin(keot) + By cox(ko) 
k=1 


When R<|1/@Cop|, it can be easily seen that I(t) converges to Isc, and that when 
R>|1/@Cpo|, Va(t) converges to Voc. 

Consequently, on the basis of the above discussions for the two cases, there is a 
phase difference between the signals of the short-circuit current and non-open- 
circuit voltage. The difference depends on the internal impedance of the piezoelectric 
nanogenerator and the external load resistance. Similarly, there is also a difference 


between the signals of the open-circuit voltage and the short-circuit current: the 
piezoelectric open-circuit voltage signal peaks when the strain reaches a maximum 
value, and the piezoelectric short-circuit current signal peaks when the strain rate 
reaches a maximum. It should be noted that the phase difference highlighted by black 
dashed lines was obtained by theoretical derivation for an ideal case; it was not 
measured experimentally because of the limitation in the experimental setup. 
Moreover, the following relationship between acceleration a, maximum speed 
Vmax and the total moving distance d (with maximum deflection Dax) can be 
obtained: VP ox = 2aD max /2 = aD max, because ocd, and 


dQsc 
dt = = & yn = 


Tsc,max = Cog YhkVmax = Cog Yhky/aD max 


Therefore, when we increase either the acceleration or the total moving distance 
Dyax» the magnitude of the short-circuit current will increase. The hysteresis 
observed for the short-circuit current and the open-circuit voltage under strain 
can also be understood from the above discussion (Extended Data Fig. 5a). 
Estimation of power generation efficiency by MoS3. The total mechanical defor- 
mation energy stored in the monolayer MoS, after being strained is calculated by 
Wm =LWEe” /2, where Eis the 2D elastic modulus of MoS, (130 N m7 ')*, Lis the 
distance between the two electrodes, and W is the width of the MoS, flake. The 
total electric energy generated through the piezoelectric polarization can be obtained 
as 1.86 X 10” '*J. Therefore the corresponding energy conversion efficiency (7 = 
W,/W ,) is found to be ~5.08% for ¢ = 0.53% (Wy, = 3.66 X 10 '3J). 
Strain-modulated carrier transport in the MoS, piezotronic effect and piezo- 
resistive effect. The observed difference in conduction behaviour under strain 
between monolayer and bilayer/bulk MoS, is attributed to their fundamental dis- 
tinction in crystal symmetries. It has been theoretically predicted that MoS, with 
odd number of layers (point group D3,,) will exhibit intrinsic piezoelectric property 
as a result of its lack of centrosymmetry in crystal structure, whereas piezoelec- 
tricity will vanish in centrosymmetrically structured MoS, with an even number 
of layers (point group Dg,)”'®'®. Resulting from changes in band structure, charge 
carrier density or the density of states in the conduction band of strained semi- 
conductor materials, the piezoresistive effect is symmetrical on the two end contacts 
and has no polarity; this will not produce the function ofa transistor. Piezoresistance 
is a common feature of semiconductors such as Si and GaAs and is not limited to 
piezoelectric semiconductors. The strain-induced modification of band structures 
in MoS, has recently been reported with a small range of compressive strain (<2%) 
increasing and tensile strain decreasing the bandgap for MoS, as a result of the 
change in orbital overlap and hybridization by strain****. The trigonal-prismatic 
coordination between the molybdenum and sulphur atoms and the absence of inver- 
sion symmetry give rise to piezoelectric polarization in strained monolayer MoS), 
with the polarization charges induced at the zigzag edges. In general, the negative 
piezoelectric polarization charges, and hence the negative piezopotential induced 
at the semiconductor side near the interface of the local contact formed between 
the metal electrode and an n-type semiconductor, can repel the electrons from the 
interface, resulting in a further depleted interface and increased local barrier heights, 
whereas the positive piezoelectric polarization charges and hence the positive piezo- 
potential created at the semiconductor side near the interface can attract the elec- 
trons towards the interface, resulting in a less depleted interface and hence decreased 
local barrier heights. The strain-induced polarization charges can therefore directly 
affect the local contacts at the metal-MoS, interfaces by exerting substantial influ- 
ences on the concentration and distribution of free carriers in MoS, as well as on 
the modulation of electronic charge in interface states or metal. The observed aniso- 
tropic changes in current transport with applied strain are associated with variations 
in Schottky barrier height (SBH) tuned by both the strain-induced change in band 
structure and piezoelectric polarization charges at a reverse-biased Pd-MoS, bar- 
rier’. The contributions to changes in SBH at source and drain contacts from the 
band structure effect share the same polarity: uniaxial compressive (<2%) or tensile 
strain in the direction along armchair edge increases or decreases the bandgap for 
monolayer MoS, (refs 33-36), whereas the modulation of SBHs at both contacts 
due to the polarization-induced surface charges and the corresponding adjustment 
of electronic states at the interface possess the opposite polarities owing to the polarity 
of induced piezoelectric charges”. These piezoelectric polarization charges can effec- 
tively modulate the local contact characteristics through an internal field, depending 
on doping type, carrier density and the crystallographic orientation of the piezo- 
electric semiconductor material as well as on the polarity of the applied strain. 
Consequently, the transport of charge carriers across the metal-semiconductor con- 
tact can be effectively modulated by the piezoelectric polarization charges, which 
can be controlled by varying the magnitude and polarity of the externally applied 
strain. The modulation or gating of the charge transport across the interface by the 
strain-induced polarization charges is the core of piezotronics. The gauge factor 


Isc = 
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({Al(e)/I(0)]/Az) of the monolayer MoS, device for strain sensing has also been 
characterized; the highest value is ~760, which exceeds the values of conventional 
metal strain gauges (~1-5), a state-of-the-art silicon strain sensor (~200) and a 
graphene strain sensor (~300)**, suggesting the potential for using monolayer 
MoS, in highly sensitive strain sensing. The higher gauge factor in a monolayer 
MoS, strain sensor than those for graphene, bilayer MoS, (~230) and bulk MoS, 
(~200) is attributed to enhancement by piezoelectric polarization. However, we 
are not sure why bilayer devices show slightly higher gauge factors than bulk samples. 
We think that relative change in carrier density and/or mobility by strain is more 
significant in bilayer devices than in bulk samples, which may be related to the 
difference between the band structures of bilayer and bulk MoS). Last, the effect 
from substrate contact with the bilayer structure may also contribute to the observed 
large piezoresistance of the device. Moreover, the much smaller thickness and ori- 
ginal conductivity in bilayer devices may also have a role here. Both theoretical 
calculations (for example, tight binding) and experimental characterizations are 
required for a better understanding of the piezoresistive effect in bilayer MoS). 

Internal screening of piezoelectric polarization charges in MoS, by free car- 
riers. The partial screening of strain-induced polarization charges by free carriers 
has been widely observed for conventional piezoelectric semiconductors such as 
ZnO (refs 38, 39). Because MoS, has a relatively small bandgap (~ 1.8 eV), it is 
anticipated that the internal screening of piezoelectric charges should exist. Con- 
sidering the n-type doping characteristics of MoS,, the positive polarization charges 
induced by strain will be partly screened by the free electrons in MoS, whereas the 
negative polarization charges will be preserved. Therefore it is still possible to ob- 
serve piezoelectric power output and piezotronic transport. The trend in band 
diagrams shown in Fig. 3c is still valid if the MoS, is not heavily doped, except that 
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the changes in SBH induced by positive piezoelectric charges will be decreased. 
Considering the much smaller dimensions of 2D materials than those of conven- 
tional piezoelectric semiconductors such as ZnO nanowires, the carrier density or 
even conductivity type in monolayer MoS, may be affected or modulated in a 
more efficient or sensitive way by substitutional doping at both the Mo and the S 
sites and by the adsorption of charged molecules. More in-depth investigations are 
therefore needed in both theory and experiments to quantify the effect of internal 
screening of piezoelectric polarization in 2D piezoelectric semiconductors. 
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Extended Data Figure 1 | Raman spectrum of MoS, flakes and setup for SHG measurement. a, Raman spectrum of MoS, flakes with different layer numbers. 
b, Experimental setup for the SHG measurement. 
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Extended Data Figure 2 | Mechanical strain applied to MoS, device. a, Schematic drawing for estimating strain in MoS, device. b, Schematic plot of strain 
driving signal from linear motor. c, Typical configurations for linear motor. 
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Extended Data Figure 3 | Piezoelectric open-circuit voltage and short-circuit current. 
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Extended Data Figure 4 | Mechanism of electrical power generation in piezoelectric polarization charges. The equivalent circuit of the piezoelectric 
single-layer MoS, due to the flow of electrons in external load driven by nanogenerator is also shown. 
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Extended Data Figure 5 | Piezoelectric output of MoS, device with different 
strain parameters. a, Short-circuit current-strain and open-circuit 
voltage-strain hysteresis loops. Hold time t; = tf = 1s and acceleration 
a=5ms ~ for the curve of 0.4 Hz; hold time t; = tf) = 0.5s and acceleration 


a=7.5ms ~’ for0.8Hz;hold time t, = t, = 0.1sandaccelerationa = 10ms 7 


for 2.5 Hz. b, Electrical outputs from bare PET substrate without single-layer 
MoS, under periodic strain (0.53%). 
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Extended Data Figure 6 | Circuit connection for measuring the power outputs on the external load and power delivered to the load at 0.53% strain. 
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Extended Data Figure 7 | Stability test of voltage output from single-layer | demonstrate good stability of the device in mechanical energy harvesting for 
MoS, device. The frequency of 0.5 Hz was held for 300 min. The results prolonged periods. 
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Extended Data Figure 8 | Angular dependence of SHG intensity their crystallographic orientation. 0 denotes the angle between the fundamental 
(perpendicular component) for three-layer and five-layer MoS. Samples of __ light polarization and the mirror plane of the lattice. 
even layers (two, four and six layers) give vanishing SHG intensity regardless of 
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Extended Data Figure 9 | Transport characteristics of bulk device under 
different uniaxial strains. 
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Extended Data Figure 10 | Electrical outputs when CVD devices 1 and 2 are destructively connected. 
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Identification of an iridium-containing compound 
with a formal oxidation state of IX 


Guanjun Wang’, Mingfei Zhou', James T. Goettel*, Gary J. Schrobilgen’, Jing Su’, Jun Li®, Tobias Schléder* & Sebastian Riedel*” 


One of the most important classifications in chemistry and within the 
periodic table is the concept of formal oxidation states’ *. The prepa- 
ration and characterization of compounds containing elements with 
unusual oxidation states is of great interest to chemists’. The highest 
experimentally known formal oxidation state of any chemical ele- 
ment is at present VIII’ “, although higher oxidation states have been 
postulated®’. Compounds with oxidation state VIII include several 
xenon compounds® (for example XeO, and XeOQ3F,) and the well- 
characterized species RuO, and OsO, (refs 2-4). Iridium, which has 
nine valence electrons, is predicted to have the greatest chance of being 
oxidized beyond the VIII oxidation state’. In recent matrix-isolation 
experiments, the IrO,, molecule was characterized as an isolated mole- 
cule in rare-gas matrices’. The valence electron configuration of iridium 
in IrO, is 5d’, with a formal oxidation state of VIII. Removal of the 
remaining d electron from IrO, would lead to the iridium tetroxide 
cation ([IrO,]*), which was recently predicted to be stable’ and in 
which iridium is in a formal oxidation state of IX. There has been some 
speculation about the formation of [IrO,]* species'"”, but these expe- 
rimental observations have not been structurally confirmed. Here 
we report the formation of [IrO,] * and its identification by infrared 
photodissociation spectroscopy. Quantum-chemical calculations were 
carried out at the highest level of theory that is available today, and 
predict that the iridium tetroxide cation, with a Ty-symmetrical struc- 
tureand a d electron configuration, is the most stable of all possible 
[IrO,]* isomers. 

Iridium oxide cations were generated in the gas phase using a pulsed- 
laser vaporization/supersonic-expansion source and were studied by 
infrared photodissociation spectroscopy in the 850-1,600 cm”! region 
as described previously’*. A typical mass spectrum of the iridium oxide 
cations produced with O,-seeded helium is shown in Fig. 1a. The spec- 
trum consists of a progression of peaks having different masses that 
correspond to different [IrO,]* species with up to six oxygen atoms. 
Enhanced abundance in the mass spectrum was found for [IrO,] * and 
the preferential formation of this cation indicates increased stability. Be- 
cause the dissociation energies of the [IrO,]* cations are significantly 
greater than the infrared photon energies in the Ir=O and O-O stretch- 
ing frequency region (the infrared photons in the 900-1,200 cm * region 
have energies in the range of 10.8-14.4kJ mol_'; Supplementary Infor- 
mation), the method of rare-gas atom predissociation is employed to 
obtain the infrared spectra for these molecules'*””. When argon was 
used instead of helium, [IrO,] +.Ar, ions (n = 1, 2 and larger; the dot 
denotes a weak bond) were produced (Fig. 1b) and the mass peaks corre- 
sponding to the [)trO4]* Ar, isotopomers were selected for photo- 
dissociation. When the infrared laser is on resonance with one of the 
vibrational fundamentals of an [IrO,]*-Ar,, complex, the latter photo- 
dissociates by eliminating an argon atom. The resulting predissociation 
infrared spectra of [IrO,] * are shown in Fig. 2. 

The experimental spectrum of [IrO,] *.Ar (Fig. 2a) consists of five ab- 
sorptions at 936, 944, 966, 1,047 and 1,054cm ~ _ respectively, (Table 1), 


indicating that more than one isomer is experimentally observed, be- 
cause any [IrO,] * structure should at most have four vibrational fun- 
damentals in the Ir=O and O-O stretching regions. Experiments with 
different time delays between expansion from the pulsed valve and vapor- 
ization (Methods), and with different stagnation pressures, suggest that 
the bands at 936 and 944 cm”! are due to the same species, whereas the 
other three absorptions are caused by other isomers. All bands were shifted 
to lower wavenumbers in the experiments using '*O, (Supplementary 
Information), and the observed frequency shifts with'°O/'*O isotopic 
ratios in the range of 1.050-1.056 suggest that these bands originate from 
Ir=O or O—O stretching vibrations. 

Recent quantum-chemical calculations show that three low-energy 
isomers are possible for a cation of [IrO4] + stoichiometry’®. These cations 
have been reinvestigated by more accurate ab initio coupled-cluster cal- 
culations with single and double excitations and perturbative-triples 
corrections (CCSD(T)), as well as by ab initio multi-reference-based com- 
plete active space perturbation theory (CASPT2) calculations (Methods). 
On the basis of these calculations, the wavenumber of the O— O stretch- 
ing mode of the superoxide complex is predicted to be 1,486.3 cm” at 
density functional level with inclusion of dispersion corrections (B3LYP- 
D3) or 1,458.7cm | (CCSD(T)) with appreciable infrared intensity. 
Hence, this superoxide complex [(n'-O,)Ir'03]* (C, symmetry, 3 Arr 
ground state) can be ruled out because of the absence of any observed 
band in the experimental spectra above 1,100 cm”. The observed absorp- 
tions probably come from the side-on O2 complex [(n?-O2)Ir"03] i 
(Coy, 'A;) and the tetroxide complex [Ir*O,] * (Ty, 1A,), where the 
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Figure 1 | Mass spectra of the iridium oxide cations. The cations are 


produced by pulsed-laser vaporization of an iridium metal target in an 
expansion of helium (a) or argon (b) seeded by dioxygen. The isotopic splitting 
of iridium can clearly be resolved with the relative peak areas matching the 
natural abundance isotopic distribution tr, 37.3%; }*7Ir, 62.7%). m/z, mass/ 
charge ratio; intensity is shown in arbitrary units. 
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Figure 2 | Infrared photodissociation spectra of the ()trO,]*- Ar, 
(n= 1-4) cations. The spectra are measured by monitoring the Ar 
photodissociation channel: a, [1°31rO,]* Ar; b, ['°71rO,]* Ar; 

c, [1°71rO,]* Ars; d, [1°71] * Aras. Intensity is shown as the yield of 
fragmentation ions normalized to the parent ion signal in percentage. 


latter is by far the most stable isomer according to high-level (coupled- 
cluster) calculations (Fig. 3). 

Following the general rules for the determination of formal oxidation 
states, the experimentally identified cation [IrO,] * can be viewed as an 
Ir(ix) species. This assignment is in line with the well-known isoelec- 
tronic OsO, molecule, in which osmium undoubtedly exists in its oxida- 
tion state VIII. A more detailed discussion of the assignment of oxidation 
states based on natural population analysis and molecular orbitals is 
provided in Methods. 


Table 1 | Observed and calculated vibrational frequencies of the 
[IrO,]* and the face-coordinated C3, [IrO,]*-Ar isomers 


lsomer [IrO4]* (cm~) [IrO4]*-Ar (cm~+) Mode* 
CCSD(T} B3LYP+ CCSD(T)+ B3LYP+ Exptl 
{lrO,]* 916.4 1,009.8 916.7 1,008.9 936 Sym. str. (Ai) 
(0) (0) (0) (0) 
963.0 1,008.5 963.3 1,007.5 Antisym. str. (T2) 
(35) (39) (13) (40) 
963.0 1,008.5 963.5 1,009.0 944  Antisym. str. (To) 
(35) (39) (11) (39) 
963.0 1,008.5 963.5 1,009.0 Antisym. str. (T2) 
(35) (39) (11) (39) 
[(m?-O2)IrO2]* 954.9 1,026.8 _— 1,025.0 966 OlrOantisym.str.(B,) 
(55) (70) (72) 
1,000.6 1,062.7 — 1,061.9 1,047 OlrO sym. str. (Az) 
(11) (12) (19) 
996.5 1,032.8 — 1,027.9 1,054 O-O str. (A1) 
(56) (60) (59) 


Infrared intensities are listed in parentheses in km mol". 

* Mode descriptions and symmetry labels for the free cations. 

+ Calculated at CCSD(T)/aug-cc-pVTZ-PP and B3LYP-D3/aug-cc-pVTZ-PP levels. The frequencies of 
the edge-coordinated (n°) C2, structure are predicted at 916.8cm™! (Okm mol", Ay), 963.0cm?! 
(12km mol", A;), 963.4cm7? (10 km mol", Bz), 963.6 cm (12 km mol", B;) at the CCSD(T) level. 
All experiments have been replicated more than ten times. 
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Figure 3 | Optimized structures and energetic ordering of the different 
[IrO,]* isomers. The structures are optimized at the CCSD(T) level. The 
barriers of the interconversion reactions are also shown. The relative energies 
are calculated at the SO-DFT/B3LYP and CCSD(T) (in parentheses) levels, 
with the energy of the most stable isomer ([IrO4] *) set to zero. The symmetries 
and ground states are as follows: [IrO.]* (Ta, 'Ay), [(n?-O2)IrOp] * (Coys 11); 
[(y!-O,)IrO,]* (C,, 3A”). Bond lengths are in pm; energies are in kJ mol. 
Red, oxygen atoms; grey, iridium atoms. 


The second most stable isomer of [IrO,]* stoichiometry is the C,- 
symmetrical side-on complex [(1?-O2)Ir"“O3]*, which was also calcu- 
lated to have a closed-shell electron configuration. All three isomers were 
calculated to be true minima on the potential energy surface and are en- 
ergetically separated by 89 and 40 kJ mol’, respectively, at the CCSD(T) 
level (Fig. 3). Our calculations show that their energetic ordering is not 
affected by weak argon coordination (Supplementary Information). 
The barrier for the [(1)'-O)IrO,]* > [(n?-O,)IrO,]* conversion was 
calculated to be only about 30 kJ mol‘ using density functional theory 
with relativistic spin-orbit coupling effects (SO-DFT/B3LYP; Fig. 3 and 
Supplementary Information). The barrier for the [(n?-0,)IrO,]* > 
[IrO,]~ reaction lies at 150 kJ mol! at the SO-DFT/B3LYP level (Fig. 3 
and Supplementary Information). This barrier is remarkably high, but 
nevertheless is plausible because the isomerization reaction involves the 
electron transfer and cleavage of the relatively strong peroxide O—O 
bond". The [Ir(n?-O,)]* > [IrO,]* isomerization reaction was com- 
puted to have a comparable barrier’’. 

From comprehensive analysis and quantum-chemical calculations, 
we assign the 936 and 944 cm‘ bands to the antisymmetric iridium oxide 
stretching modes of the argon-tagged iridium tetroxide cation (Table 1 
and Fig. 2a). Only the triply degenerate antisymmetric stretching mode 
(T> mode) is infrared-active in the tetrahedral [IrO4]~ cation. In the ex- 
perimental spectrum of [IrO,]* Ar, we identified two bands separated 
by 8 cm’ using argon-atom tagging. Our theoretical calculations on all 
three possible argon coordination modes to [IrO4] + namely face coor- 
dination (1°, Csy), edge coordination (1, Coy) and vertex coordination 
(n) C;,), show that only the face- and edge-coordinated isomers are 
stable, with the face-coordinated isomer being slightly more stable than 
the edge-coordinated isomer (2.0 kJ mol’ at CCSD(T) and 3.5 kJ mol"! 
at CASPT2). Therefore, both isomers may coexist under the experimental 
conditions. As a result of symmetry reduction by argon coordination, 
the triple degeneracy of the antisymmetric iridium oxide stretching modes 
is lifted, and the T, mode splits into distinct modes. Calculations at the 
B3LYP and CCSD(T) levels show very small mode splitting (0.2 cm! 
at CCSD(T), 1.5cm™ at B3LYP-D3 and 2.0 cm at SO-DFT/B3LYP 
for the face-coordinated isomer) because the predicted O---Ar distances 
are quite large and the [IrO,] " moiety in [IrO,] *- Arhas essentially the 
same structure as the free cation. Additional ab initio multi-reference- 
based CASPT2 calculations found that there are multi-reference features 
in [IrO,]* and [IrO,]*-Ar. The optimized structures (Supplementary 
Information) show that the O---Ar distances are reduced by comparison 
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with single-reference methods and that the IrO, fragment exhibits some 
structural distortion from tetrahedral symmetry, which will surely cause 
a larger mode splitting than is calculated at the B3LYP and CCSD(T) 
levels of theory. 

Further evidence for the above assignment is gained from additional 
argon atom coordination. The infrared spectra of the [IrO,] *. Ar, cations 
with n = 2-4 are shown in Fig. 2b-d. Owing to symmetry reduction, 
further mode splitting is observed in the case of [IrO4] *. Ary, where the 
lower-wavenumber band is broad, suggesting the presence of unre- 
solved bands. The spectrum of [IrO,] *. Ar, shown in Fig. 2c is about 
the same as that of the single-argon-coordinated complex because the 
[IrO,]* cation of the [IrO,]*-Ar; complex can retain the same symme- 
try as the [IrO,] * cation in [IrO,]*-Ar. If the experimental conditions 
are varied, a spectrum involving three distinct bands at 928, 938 and 
944 cm’ can clearly be resolved for the [IrO,]*-Ars cation, suggesting 
that additional isomers with mixed coordination modes can be formed 
(Supplementary Information). When four argon atoms were tagged, only 
one band at 939 cm” was observed in the spectrum of the [IrO,] * Ary 
cation (Fig. 2d). In this complex, the tetrahedral symmetry of [IrO,] Vis 
retained, and so no splitting due to symmetry reduction is expected. 

Apart from the bands assigned to the iridium tetroxide cation, three 
additional absorptions in the experimental spectrum of [IrO,]*-Ar 
were attributed to different vibrational modes of the [(17-O2)IrO3] * 
complex (Table 1 and Fig. 3). The assignment of the 1,054cm™' band 
to the O—O stretching vibration is consistent with the identification 
of [(17-O3)IrOo] * asacationic peroxide complex”. The unprecedented 
Ir(vul) oxidation state of [(n?-O)IrO2]* now closes the gap between 
the well-known VI and the recently discovered VIII’ oxidation states of 
this metal. Further evidence for the assignment of the vibrational bands 
is provided by comparison with the isoelectronic compounds OsO, and 
[(n?-O)OsO,] previously investigated by matrix-isolation spectroscopy” 
(Supplementary Information). Although this [(n?-O2)IrO,]* complex 
was predicted at CCSD(T) level to be 89 kJ mol’ higher in energy than 
the tetroxide cation isomer, both structures were experimentally observed, 
most probably due to the relatively high barrier for the interconversion 
between the two isomers (see above). With both the [(n?-O,)Ir”"O,]* 
peroxo complex and the [Ir*O,] * tetroxide complex experimentally char- 
acterized here, all possible positive oxidation states of iridium ranging 
from I to IX are now known. 

As well as this gas-phase spectroscopic characterization, experimental 
attempts have been undertaken to isolate iridium compounds in the IX 
oxidation state. Proposed syntheses of stable [IrO,]* salts by the reac- 
tion of [Oz] [SbF] or [O2][Al(OC(CF3)3)4] with IrO2 were motivated by 
the calculated Gibbs free energies (AG°) derived from the appropriate 
Born-Haber cycles: the reactions forming [IrO4][SbF.] and [IrO,][Al 
(OC(CF3)3)4] were predicted to be exergonic by —14and —66 kJ mol — - 
respectively’. The use of the weakly coordinating anion [Sb2F,,] would 
also be expected to lead to an exergonic reaction having a AG? value that 
is intermediate with respect to [SbF] and [Al(OC(CF3)3)4] . Consid- 
ering the possibility ofa thermally unstable product, the reactions of [O.] 
[Sb,F, ,] and IrO, were initially attempted at low temperatures (— 120 to 
—78 °C) in SO,CIF and anhydrous HF solvents. The SO,CIF solution 
instantly turned bright purple, which was shown to result from the in- 
teraction between O,* and SO,CIF; that is, dissolution of [02] [Sb2F,,] 
in SO,CIF under similar conditions also yielded a bright purple solu- 
tion. Attempts to synthesize a stable [IrO,][Sb,,F5,,+-:] salt by the reaction 
of [O2][Sb2F,;] and IrO, in superacidic mixtures of HF and SbF; also 
gave purple solutions, but no evidence for [IrO4]* was found. The pur- 
ple solutions are probably attributable to polyoxygen fluoride radicals, 
(O2)n+1F, as previously described”. To further test the oxidizability of 
IrOz, the reaction of molten XeF, with IrO, was attempted to determine 
whether an iridium oxide fluoride species in a higher oxidation state could 
be formed. There was no apparent reaction even after heating the reaction 
mixture to 100 °C for over one hour. However, the addition of anhydrous 
HF to the aforementioned reaction mixture of IrO, and XeF, affordeda 
dark brown solution at 21°C, from which crystals of [Xe2F;][IrF,] 
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were grown. The crystal structure of [Xe F,,][IrF.], and related work, will 
be reported elsewhere. Although IrO, reacted with XeF,, the high lattice 
enthalpy associated with the rutile structure of IrO, may significantly 
inhibit its reactivity with O,*. Details of the aforementioned attempts 
to synthesize an [IrO,] “ salt are provided in Supplementary Information. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Experimental details. Iridium oxide cations were generated in the gas phase using 
a pulsed-laser vaporization/supersonic expansion source. Reactive iridium atoms 
and cations were generated by the 1,064-nm fundamental of a Nd:YAG laser that 
was focused onto a rotating iridium metal target. Helium or argon gas, seeded with 
a few per cent of O2, was expanded out ofa pulsed valve (General Valve, Series 9) at 
a stagnation pressure of 8— 13 atm. The iridium oxide ions and their argon-tagged 
complexes generated by laser vaporization were collisionally cooled in the expansion. 
After free expansion, the cations were skimmed into a second differentially pumped 
chamber, where the cations were pulse-extracted into a collinear tandem time-of- 
flight mass spectrometer. Ions were mass-selected by their flight times and then 
studied by infrared laser photodissociation spectroscopy using a tunable infrared 
optical parametric oscillator/optical parametric amplifier laser system (LaserVision, 
pumped by a Continuum Powerlite 8000 Nd:YAG laser). The photodissociation 
spectrum was obtained by monitoring the yields of the fragment ions as a function of 
the dissociation infrared laser wavelength and normalizing to the parent ion signal. 
Theoretical and computational details. The iridium oxides and argon-tagged 
species were studied by using both density functional theory (DFT) and wave- 
function-based theory (WFT). The structures of all molecules were fully optimized 
by relaxing all geometric parameters at the DFT level using the B3LYP functional”***. 
The choice of this functional was based on its good performance for comparable 
molecules”. To determine the ground states of the molecules, several spin multi- 
plicities were calculated. In the case of the weak argon complexes, an empirical 
correction for dispersive interactions (D3) was included in the DFT calculations”. 
Further symmetrical stationary points on the potential energy surface were calcu- 
lated within the restrictions of the given point groups. Dunning’s correlation- 
consistent triple-¢ basis sets (aug-cc-pVTZ) were used for both oxygen and argon 
atoms”””*. Scalar relativistic effects were considered by using relativistic energy- 
consistent small-core pseudopotentials for the metal atom and the corresponding 
aug-cc-pVTZ-PP basis sets*’’. For brevity, this basis set combination is referred 
to as aug-cc-pVTZ(-PP). In the DFT calculations, averaged masses were used for 
all elements if not otherwise noted. All standard DFT calculations were performed 
with the Gaussian 09 program package”, whereas the Turbomole V6.4 suite of pro- 
grams was used for DFT-D3 calculations”. 

Beyond these calculations, the influence of spin-orbit coupling has been inves- 
tigated for the energy-minimized structures as well as for the energy barriers. To 
estimate the interstate crossing energy barrier of the transition from the [IrO,(n7- 
O,)]* (@) singlet to the [IrO4] *(d@) singlet, a series of linear transit calculations were 
performed by using the DFT/B3LYP method. The starting point of the linear transit 
energy curves corresponds to the optimized [IrO2(1n?-O,)] * structure and the end- 
ing point to the optimized [IrO,] * structure. The linear transit total energies were 
obtained by constrained optimizations at each linear transit coordinate along the 
distance from Ir to the centre of the n?-O, configuration. The Slater basis sets with 
the quality of triple-¢ plus two polarization functions™ were used, with the frozen- 
core approximation applied to the inner shells [1s’—4f"*] for Ir and [17] for O. In 
these calculations, the scalar relativistic and spin-orbit effects were taken into account 
by the zeroth-order regular approximation®*. With inclusion of scalar relativistic 
effects, both the singlet and triplet potential energy surfaces were explored. To further 
investigate the spin-orbit effects, unrestricted Kohn-Sham calculations with a non- 
collinear spin-orbit approach were also used in the linear transit calculations. These 
linear transit calculations were done with Amsterdam density functional (ADF 
2013.01) program***. 

Further ab initio WFT calculations were performed using coupled-cluster the- 
ory at the CCSD(T) level, which used the B3LYP structures with retention of the 
molecular symmetries. In the CCSD(T) calculations, spin-restricted open-shell 
Hartree-Fock reference wavefunctions were used in the case of open-shell elec- 
tron configurations. The CCSD(T) calculations were done using the frozen-core 
approximation with the 1s” (O), 15°2s?2p° (Ar) and 53°5p° (Ir) orbitals excluded 
from the evaluation of the correlation energies. Stationary points on the potential 
energy surface were characterized by harmonic vibrational frequency calculations 
for both the !°O and the 180 isotopomers. For all other elements, the mass of the 
most abundant isotope was used in the CCSD(T) calculations. Ab initio coupled- 
cluster calculations were done using the CFOUR program package”. 

To better account for the non-dynamic electron correlation arising from elec- 
tronic transitions from O 2p lone pairs to Ir 5d orbitals, we applied ab initio com- 
plete active space second-order perturbation theory (CASPT2), as implemented in 
MOLPRO 2008.1*°, to do ground-state geometry optimizations of the [IrO,]* and 
[IrO4] *-Ar cations using the same aug-cc-pVTZ(-PP) basis sets. In the CASPT2 


calculations of [IrO,]* with Ta symmetry, the active space contains the lowest five 
virtually occupied orbitals (that is, IO antibonding orbitals of e and t, symmetry) 
and the highest six occupied orbitals of tf; and t2 symmetry (that is, non-bonding 
orbitals of O 2p character), which gives 12 electrons in 11 orbitals, that is, CAS(12, 11). 
Similar active spaces were chosen for [IrO4] *-Ar isomers of C3,, C2, and C, sym- 
metry, respectively. A larger active space containing the lowest four unoccupied 
orbitals and all the occupied valence orbitals except the Ir non-bonding 5d orbital 
of a; symmetry and the O-O o-bonding orbital of a, symmetry, which includes 20 
electrons and 14 orbitals, that is, CAS(20, 14) was chosen for the [(O2)IrO3] * cation 
with C,, symmetry. In CASPT2 calculations, the 1s” (O), 1572s72p° (Ar) and 53°5p° 
(Ir) orbitals were not correlated, to save computing time. 

Assignment of oxidation states. Usually the formal oxidation state of a central 
atom ina coordination sphere is defined as the charge of the central atom when every 
ligand of the coordination sphere is removed in its most stable form. The bonding 
electron pairs between the metal centre and the ligands are therefore exclusively 
assigned to the more electronegative fragment, resulting in negative oxidation states. 
For instance, for fluorine and doubly bonded oxygen ligands the oxidation states of 
fluorine and oxygen are —I and —II, respectively. According to these rules, the oxida- 
tion number of iridium in the [IrO4]* cation is IX, whereas for the isoelectronic OsO4 
molecule, an oxidation state of VIII is obtained for osmium. However, the concept of 
oxidation states is a formal one and it has to be kept in mind that calculated atomic 
charges cannot be used to directly determine oxidation numbers’. The former 
mostly depend on the electronegativities of the ligands and not on the oxidation 
state of the metal, as exemplified by the isoelectronic d° anions [V¥O4]*”, [Cr“O4]*- 
and [Mn’"O,]", which all represent the highest possible oxidation states of the 
metal elements, namely V, VI and VII, and for which almost identical charges of 
1.05, 1.09 and 0.96 were computed for the central atoms**. We note that the com- 
puted charge of Cr(v1) is, in fact, higher than that of Mn(v1)) (refs 4, 41). The calcu- 
lated atomic charges at the B3LY P/aug-cc-pVTZ(-PP) level using natural population 
analyses of osmium and iridium in OsO, and [IrO,4]* are 1.475 and 1.470, respect- 
ively. These atomic charges amount to only a small fraction of the formal oxidation 
state, which is, however, well established in the case of OsY"O,, and there is thus no 
doubt about the assignment of the formal oxidation state of [Ir*O,]”, even if the 
calculated charge of Ir(1x) is slightly less than that of Os(vim). 
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Methane dynamics regulated by microbial 
community response to permafrost thaw 
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Permafrost contains about 50% of the global soil carbon’. It is thought 
that the thawing of permafrost can lead to a loss of soil carbon in the 
form of methane and carbon dioxide emissions”*. The magnitude of 
the resulting positive climate feedback of such greenhouse gas emis- 
sions is still unknown’ and may to a large extent depend on the poorly 
understood role of microbial community composition in regulating 
the metabolic processes that drive such ecosystem-scale greenhouse 
gas fluxes. Here we show that changes in vegetation and increasing 
methane emissions with permafrost thaw are associated with a switch 
from hydrogenotrophic to partly acetoclastic methanogenesis, result- 
ing ina large shift in the 5'°C signature (10-15%o) of emitted methane. 
We used a natural landscape gradient of permafrost thaw in northern 
Sweden*’ as a model to investigate the role of microbial communit- 
ies in regulating methane cycling, and to test whether a knowledge of 
community dynamics could improve predictions of carbon emissions 
under loss of permafrost. Abundance of the methanogen Candidatus 
‘Methanoflorens stordalenmirensis” is a key predictor of the shifts in 
methane isotopes, which in turn predicts the proportions of carbon 
emitted as methane and as carbon dioxide, an important factor for 
simulating the climate feedback associated with permafrost thaw in 
global models*’. By showing that the abundance of key microbial 
lineages can be used to predict atmospherically relevant patterns in 
methane isotopes and the proportion of carbon metabolized to meth- 
ane during permafrost thaw, we establish a basis for scaling changing 
microbial communities to ecosystem isotope dynamics. Our findings 
indicate that microbial ecology may be important in ecosystem-scale 
responses to global change. 

Multiple factors—including hydrology, vegetation, organic matter chem- 
istry, pH and soil microclimate—are affected by permafrost loss**?. 
Together these factors regulate microbial metabolisms that release car- 
bon dioxide (CO;) and methane (CH,) from thawing permafrost’?"” 
and are the basis for Earth-system model predictions of future CH, 
emissions”’*"*, However, the role of microbial community composition 
in regulating the metabolic processes that drive ecosystem-scale fluxes 
is unknown. 

At our study site in Stordalen mire, as in other thawing permafrost 
peatlands*", permafrost loss causes hydrological and vegetation shifts: 
well-drained permafrost-supported palsas collapse into partly thawed 
bogs dominated by moss (Sphagnum spp.) and fully thawed fens domi- 
nated by sedges (such as Eriophorum angustifolium)*. Between 1970 and 
2000, 10% of Stordalen’s palsa habitat thawed into such wetlands‘. This 
transition drives an appreciable global warming impact because CO- 
emitting palsa is converted to bogs and fens, which take up CO, but emit 
CH, (a more potent greenhouse gas*)**”*. The net effect is that the high- 
methane-emitting fen contributes sevenfold as much greenhouse impact 
per unit area as the palsa. This thaw progression is also associated with 
an increase in overall organic matter lability, including a decrease in C:N 
ratio and an increase in humification rates’. We speculated, consistent 


with previous studies of in situ bog and fen systems'”"’, that thaw 
progression also facilitates a shift from hydrogenotrophic to acetoclas- 
tic CH, production. 

We used the distinct isotopic signatures of different microbial CH, 
production and consumption pathways to directly relate changes in CH4 
dynamics across the thaw gradient to underlying changes in the micro- 
bial community. Methane produced by hydrogenotrophic methanogens 
generally has lower 5'°C and higher 5D (6'°C = —110%o to —60%o and 
5D = —250%0 to — 170%o) relative to that produced by acetoclastic metha- 
nogens (5'°C = —60%o to —50%o and 5D = —400%o to —250%o)!>”°. 
If methanotrophic microbes then oxidize CHy, lighter molecules are 
preferentially consumed, leaving the remaining CH, enriched in '°C 
and D relative to the original CH, pool (expected patterns are shown in 
Extended Data Fig. 1)!°”°. 

High-temporal-resolution measurements of the magnitude and iso- 
topic composition of CH, emissions, using a quantum cascade laser spec- 
trometer (Aerodyne Research Inc.) connected to autochambers, showed 
that CH, emissions and their ‘°C content increased with thaw. Average 
CH, fluxes increased from effectively zero at the intact permafrost palsa 
site to 1.46 + 0.37 mg CH, m *h ' (allerrorsare reported as s.e.m.) at the 
thawing Sphagnum site, and to 8.75 + 0.50 mgCH,m *h_ ‘atthe fully 
thawed Eriophorum site (Fig. 1a; P< 0.001). The average 5'°C of emit- 
ted CH, also increased significantly, from —79.6 + 0.9%b0 in the Sphagnum 
site to —66.3 + 1.6%o in the Eriophorum site (Fig. 1b; P = 0.03). This 
consistent 10-15%o divergence between sites was maintained through 
the growing season but overlain by parallel fluctuations in 8'*C-CH,, 
suggesting that weather patterns exerted a common influence over the 
magnitude of isotopic fractionation. Porewater CH, isotopes showed a 
similar pattern, with Eriophorum site porewater 5'°C about 10%o higher 
than that of Sphagnum (July and August; Fig. 1b and Extended Data 
Table 1). Porewater CH, was '*C-enriched by 5-20%p relative to emitted 
CH,, as expected, as a result of diffusive fractionation (Methods, equa- 
tion (2))'8”", 

The apparent fractionation factor for carbon in porewater CH, rela- 
tive to CO2, %&c (Methods, equation (2), and Extended Data Table 1), is 
arelated index of changes in CH, production”. Greater fractionation is 
associated with hydrogenotrophic methanogenesis and was found in 
the thawing Sphagnum site (% = 1.053 + 0.002). Significantly less frac- 
tionation (P = 0.002) associated with more acetoclastic production or 
with consumption by oxidation was found in the fully thawed Eriophorum 
porewater (oc = 1.046 + 0.001). Here, increases in acetoclastic produc- 
tion, not oxidation, best explain isotopic shifts because lower o¢ and higher 
5'°C-CH, are accompanied by significantly lower 5D-CH, (Extended 
Data Fig. 1; P< 0.001)”. This is consistent with the pattern of isotopes in CH, 
emissions, in incubations of Stordalen peat’ and in studies showing bog- 
to-fen shifts from hydrogenotrophic to acetoclastic methanogenesis'”"”. 

The CH, flux and isotope results provide compelling but indirect evid- 
ence for changes in CH4-cycling microbial communities with permafrost 
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Figure 1 | Increases in the magnitude and 5'°C signature of CH, during 
permafrost thaw track shifts in methanogen communities. a, Average daily 
CH, emissions (error bars represent s.e.m.; n = 2-3). b, 8BC composition 

of emitted and porewater CH, (error bars represent s.e.m.; flux n = 2-3, 
porewater n = 6-9). c, Relative abundance of methanogenic groups as inferred 


thaw. These microbiological changes could be shifts in activity of par- 
ticular community members or changes in community composition. We 
examined the role of community composition through 16S rRNA gene 
amplicon sequencing. All known methanogens belong to a small number 
of archaeal lineages within the Euryarchaeota”’. As expected, the shift 
from CH,-neutral intact permafrost palsa to CH,-emitting wetland cor- 
responded to a substantial increase in the relative abundance of metha- 
nogenic archaeal lineages (Fig. 1c and Extended Data Table 2, 3). In the 
aerobic palsa and surface Sphagnum habitats, methanogens were found 
in low relative abundance (average less than 0.6%), whereas the anaer- 
obic environments of the Eriophorum and deeper (below the water table) 
Sphagnum habitats harboured communities with a substantially higher 
relative abundance of methanogens (20-30%). 

More significantly, the abundance of specific methanogenic lineages 
varied across the thaw gradient (Fig. 1c and Extended Data Table 2) in 
amanner corresponding to shifts in CH, production mechanism inferred 
from the isotope data (Fig. 1b). At the partly thawed Sphagnum site, where 
CH, isotopes were more hydrogenotrophic, the methanogen community 
was dominated by hydrogenotrophic populations (at least 57% of se- 
quences). Members of the genus Methanobacterium and close relatives 
of the recently described hydrogenotroph Candidatus ‘Methanoflorens 
stordalenmirensis® (a partial genome of which has also been identified 
in incubations of Alaskan permafrost’’) were the most abundant phylo- 
types. Although present, the metabolically versatile Methanosarcina (cap- 
able of using a wide range of substrates, including acetate and hydrogen”), 
was much less abundant, averaging about 15% of the methanogen se- 
quences. At the fully thawed Eriophorum site (where isotope signatures 
shifted towards acetoclastic), members of the obligately acetoclastic genus 
Methanosaeta increased in abundance, comprising roughly one-third 
of the methanogenic population. The remaining methanogenic com- 
munity at the Eriophorum site was taxonomically diverse and included 
lineages also present at the Sphagnum site, as well as the hydrogeno- 
trophic genus, Methanoregula (Extended Data Table 2). Differences in 
the functional (hydrogenotrophic versus acetoclastic) composition of 
the methanogen community between the sites were smallest in October, 
coinciding with a convergence in 5'*C-CH, (Fig. laand Extended Data 
Tables 2 and 3). 

Taken together, the isotope and microbial sequence data suggest that 
shifts in microbial communities drive large, concordant variations in 
CH, isotope biogeochemistry both seasonally and during permafrost 


by taxonomic identity assigned from 16S rRNA amplicon sequencing (n = 3). 
For the intermediate-thaw Sphagnum site, aerobic communities were 
sampled above the water table; anaerobic communities were sampled below 
the water table. 


thaw, a novel observation at the ecosystem scale. The early successional 
hydrogenotroph ‘M. stordalenmirensis® dominates methanogenic meta- 
bolism in the early stages of thaw, followed by the subsequent emergence 
of a more diverse methanogen community, including obligate acetoclas- 
tic methanogens. This microbial succession provides direct evidence 
for how changes in ecosystem structure during permafrost thaw (plant 
succession and increases in organic matter quality’) translate into altered 
CH, biogeochemistry. 

To quantify the effect of this shifting microbial community composi- 
tion for CH, isotopic patterns, we examined the relationships between 
isotope fractionation (%-), environmental conditions known or expected 
to impact methanogenesis, and the relative abundance of specific metha- 
nogenic lineages (Extended Data Table 4). Rather than a functional group 
(such as hydrogenotrophic methanogens), a single organism—the hydro- 
genotroph ‘M. stordalenmirensis —was the best one-variable predictor 
a b 
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Figure 2 | Correlation between a and both Candidatus ‘Methanoflorens 
stordalenmirensis and the anaerobic CH4:CO, production ratio. a, The 
relative abundance of a single methanogen, Candidatus “Methanoflorens 
stordalenmirensis’, in the field was significantly correlated (linear regression, 
P<0.001, m = 41) with porewater effective fractionation (%¢), an isotopic 
indicator of the methanogenic production pathway. b, Anaerobic incubations 
of peat collected from a related thaw sequence at Stordalen mire (see methods in 
ref. 9) show a significant correlation between oc and the CH4:CO, production 
ratio (linear regression, P = 0.004, n = 9), suggesting that the abundance of 
“M. stordalenmirensis may be indicative of the proportion of organic matter 
metabolized to CHy. 
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of isotopic patterns in the field (Fig. 2a). Several variables that typically 
differentiate bogs from fens, including pH and water table depth’’, 
were significant predictors of %.; however, it was the relative abund- 
ance of ‘M. stordalenmirensis’ that explained both the large range of % 
observed at the Sphagnum site (R* = 0.7, P< 0.001) and patterns across 
sites (R’ = 0.6, P< 0.001). This suggests, contrary to the current practice 
of focusing on the functional diversity of communities, that an individual 
microbial lineage can have a disproportionate influence on ecosystem 
biogeochemistry. 

Stepwise regression identified environmental variables (water table 
depth, peat C:N ratio and peat 5'°C) that improved model predictions 
of a (to R? = 0.8, P< 0.001). Although confirming the central import- 
ance of ‘M. stordalenmirensis’ in explaining variation in %< (Extended 
Data Table 5), this model also supports the hypothesis that organic matter 
chemistry underlies shifts in CH, metabolism®”*. The dependence on 
the abundance of this lineage was evident despite the relative rather 
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Figure 3 | Simulated effect of CH, from different methanogen communities 
in thawing permafrost on atmospheric 5'°C-CH, in a box model of the 
atmosphere. a, Modelled CH, emissions under high (red bounding lines) and 
low (orange bounding lines) climate warming scenarios, and a range within 
each (grey tint) spanning high and low C-release scenarios’. b, Consequent 
decreases (simulated by the intermediate emissions scenario indicated by the 
red dashed line in a) in 8'°C of atmospheric CH, due to emissions dominated 
by hydrogenotrophic lineages, as in intermediate-thaw Sphagnum sites (green 
line, 84°C = —80%o), or more by acetoclasts, as in fully thawed Eriophorum 
sites (blue line, 5'*C = —65%o). Atmospheric inversion models typically 
assume that emissions have 5'°C ranging from —60%o (black line) to —65%o 
(blue line). (The dotted horizontal line indicates the current detection limit for 
atmospheric CH, isotopes”). These imply an underestimate of the effect on 
atmospheric 5'°C for the given emissions scenario (blue or green). c, To match 
observed atmospheric isotopes, the box model would then require a 
corresponding overestimate of CH, flux attributed to permafrost thaw (vertical 
axis). The magnitude of the overestimate depends on the mismatch between 
model-assumed isotopic composition (upper line, —60%0; lower line, —65%o) 
and the actual isotopic composition produced by different communities, which 
ranges here along the horizontal axis from —80%o (hydrogenotroph- 
dominated, as in the partly thawed Sphagnum sites) to —65%o (acetoclastic, as 
in the fully thawed Eriophorum sites). 


than the absolute nature of the community composition analysis, and the 
measurement of abundance rather than activity. We speculate that direct 
measures of gene expression or metabolic activity (metatranscriptomics 
and metaproteomics) will have an even stronger association than com- 
munity composition data with isotopic signatures. 

Further analysis showed that wc is significantly correlated (R* = 0.7, 
P = 0.004) with the large range in CH,:CO, production ratio (0.13-0.84) 
measured in anaerobic incubations of Stordalen peat (Fig. 2b). It is there- 
fore likely that changes in the proportion of anaerobically mineralized 
C that ends up as CH,—a key, but poorly constrained, parameter in global 
CH, models”°—tracks the abundance of ‘M. stordalenmirensis’, which 
acts as an index of the concerted changes in microbial community and 
organic matter chemistry that together control the efficiency of carbon 
metabolism. 

Incorporating this understanding of the imprint of microbial com- 
munities could be crucial to both improved model prediction of future 
climate change CH, feedbacks and accurate attribution of the portion 
of global atmospheric CH, change that is derived from permafrost thaw. 
First, in simulating CH, cycling, Earth-system models typically prescribe 
as fixed the fraction of anaerobically metabolized carbon that becomes 
CH, (ref. 26). The lack of a basis for predicting this parameter across 
ecosystems and in response to climate change limits current modelling 
efforts’. Our finding that the CH4:CO, production ratio is highly vari- 
able and predictable from isotopic indicators of methanogenic commun- 
ity composition (Fig. 2b) supports the need to improve the representation 
of microbial ecology in models'’”’. Although simulating microbial popu- 
lation dynamics is beyond the scope of current global models, the iden- 
tification of microbial lineages that predict key parameters, such as the 
CH4:CO, ratio, provides insights that improve simulations of CH, bio- 
geochemistry used to estimate global emissions. 

Second, atmospheric inversion studies that use CH, mixing ratios and 
isotopes to infer global sources and sinks of atmospheric CH, assume 
that wetland microbial sources are dominated by acetate fermentation 
(—58%o0 to —65%o), and, critically, that isotopic signatures from biological 
sources are constant over time”*”’. In contrast, we observed isotopic com- 
positions that varied across a gradient of permafrost thaw: hydrogeno- 
trophic methanogenesis was estimated to produce about 50-75% of 
total CH, emission at Stordalen (Extended Data Table 6), with 5'°C aver- 
aging —80%p (Fig. 1b). The hydrogenotrophic 5'°C observed at Stordalen 
and other Arctic wetlands*® may be a ubiquitous characteristic of thaw- 
ing permafrost, particularly during thaw stages that generate recalcitrant 
organic matter”, such as that observed at Stordalen in the intermedi- 
ate-thaw Sphagnum site. 

To test whether these observed thaw-induced changes in microbial me- 
tabolism might be relevant for large-scale atmospheric methane dynamics, 
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we used a simple box model of atmospheric mixing (Methods, equa- 
tion (3)) to quantify the effect of different methanogen communities 
within recently constructed scenarios of CH, emission from thawing 
permafrost” (Extended Data Fig. 2a, b). We found that if hydrogeno- 
trophic lineages regulated CH, isotope patterns in permafrost thaw gen- 
erally, as at Stordalen, then projected CH, emissions (Fig. 3a) would 
produce larger decreases in 5'°C of atmospheric CH, than expected 
from current inversion model assumptions that acetoclasts dominate 
emissions (Fig. 3b and Extended Data Fig. 2c, d). This in turn would 
constrain our simple box model to substantially overestimate the amount 
of CH, released from thawing permafrost and underestimate emissions 
from non-wetland sources, most notably fossil fuels (Fig. 3c). The greater 
the prevalence of hydrogenotrophic lineages in CH, emissions, the larger 
will be the overestimate of fluxes from thaw (Fig. 3c). The numerical 
size of the mis-estimation error here is illustrative; state-of-the-art three- 
dimensional inversion models have spatially resolved constraints that 
would probably force smaller flux mis-estimations. However, the gen- 
eral implication is that microbial effects are sufficiently important that 
accurate global accounting of the different sources of CH, under future 
climate change can be improved by understanding the microbial com- 
munity dynamics underlying biological feedbacks in natural systems. 

By showing that the abundance of key microbial lineages can be used 
to predict atmospherically relevant patterns in CH, isotopes and the 
proportion of carbon metabolized to CH, during permafrost thaw, this 
work establishes a basis for scaling changing microbial communities to 
ecosystem-scale and global-scale atmospheric isotope dynamics. It also 
highlights the central role of microbial ecology in ecosystem-scale res- 
ponses to global change and the benefit of incorporating microbial 
dynamics into Earth-system models. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Site description and permafrost thaw. Stordalen is a sub-arctic palsa mire located 
10km east of Abisko in the discontinuous permafrost zone of northern Sweden 
(68° 21'N, 18° 49’ E, altitude 363 m above sea level). This work focuses on three 
distinct subhabitats, common to northern wetlands and together covering about 
98% of the mire’s surface: permafrost-dominated, well-drained palsas occupied by 
feather mosses and ericaceous and woody plants, covering 49% of the mire; inter- 
mediate permafrost sites with variable water table depth, dominated by Sphagnum 
spp., covering 37% of the mire; and full summer-thaw, wet sites with Eriophorum 
angustifolium, covering 12% of the mire. Between 1970 and 2000, as permafrost 
thawed and palsas collapsed, Sphagnum sites and Eriophorum sites expanded by 
3% and 54%, respectively*. 

The formation of wetlands after permafrost thaw, as observed at Stordalen, is a 

widespread characteristic of peatlands affected by permafrost loss*?!"**. Thawing 
of ice-rich features results in peatland collapse and the formation of bogs and fens. 
At Stordalen, thaw is associated with a progression from ombrotrophic bogs to mi- 
nerotrophic fens due to thaw-induced subsidence increasing hydrological connec- 
tivity. A similar successional shift from bogs dominated by Sphagnum spp. to tall 
graminoid fens has been observed in other northern peatlands****°. More generally, 
landscape features and hydrological conditions dictate the characteristics and 
trajectory of wetland communities formed after permafrost thaw’. For example, 
rapid fen development is observed at the subsiding margins of permafrost plateaux”, 
whereas collapse bogs and thermokarst lakes often form within large, thawing 
peatland complexes”. Large uncertainty in model predictions of the extent and 
characteristics of wetland formation arising from permafrost thaw is a critical 
limitation to current understanding of carbon-climate feedbacks’. As demonstrated 
in this study, improved characterization and modelling of peatland transformation 
during thaw will be essential for accurately predicting post-thaw microbial com- 
munities and the resultant magnitude and isotopic composition of CH, emissions 
under climate change. 
Methane isotope systematics. We use standard 6 notation for quantifying the 
isotopic compositions of CH, and CO;: the ratio R of 3C to °C (or D to H) in the 
measured sample is expressed as a relative difference (denoted 51°C or 5D) from 
the Vienna Pee Dee Belemnite (VPDB) international standard material. For example, 
for C: 


pies sy (1) 
Rvypps Rvpps 


51°C is often expressed in parts per thousand (%b). 
Isotopic fractionation in chemical reactions (including methanogenesis or 

methanotrophy) or due to diffusion may be quantified as 

Ssource + 1 

O product +1 


Resource 


Rproduct 2) 
For diffusive fractionation, R,ource is taken to be the isotopic ratio in the concen- 
trations of the gradient and R,,oduct the ratio in the resultant net flux. Because 
diffusion discriminates against the heavy isotope, Rproduct < Rsource Which implies, 
for example, that the isotopic ratio of porewater (the ‘source’) will be greater than 
that of the flux of gas diffusing out, as we see here (Fig. 1a). Methanogenesis and 
methanotrophy also discriminate against the heavier isotopes, so that Rproduct 
< Resource (and hence « > 1) for both C and H in methane. Note that «> 1 for 
methanotrophy implies that the products of CH, oxidation (CO2 and H,0) are 
lighter (have lower R) in both C and H relative to the source CH4; however, mass 
balance then requires the residual methane not oxidized to become heavier in both 
Cand H relative to the starting composition of the CH, pool before oxidation. 

The degree of C isotopic fractionation between CO, and CH, differs between the two 
main biochemical pathways of methanogenesis, namely acetoclastic (CHs;COOH > 
CH, + CO,) and hydrogenotrophic (CO, + 4H, 2H,O + CHy). Carbon isotope 
fractionation (ac) is greater for hydrogenotrophic than for acetoclastic methano- 
genesis, but 0%, (hydrogen isotope fractionation) follows the opposite pattern: 0%, 
(hydrogenotrophic) < a; (acetoclastic) (Extended Data Fig. 1; ref. 19). Hence, 
variations in C and H isotopic compositions of CH, that arise from variations in 
methanogenic pathway will be anti-correlated: shifts from hydrogenotrophic to 
acetoclastic production will cause C isotope ratios to increase but H isotope ratios 
to decline, moving along a negatively sloped ‘production line’ in H-C isotope space 
(Extended Data Fig. 1). Isotopic variations that arise from variations in the degree 
of methanotrophy, by contrast, will be positively correlated: shifts towards increas- 
ing methanotrophy will cause both C and H isotope ratios to increase along a 
positively sloped ‘oxidation line’ (Extended Data Fig. 1). 

Ina field study such as this, it is difficult to estimate fractionation factors directly; 
we therefore follow standard practice in the methane biogeochemistry literature (see, 
for example, refs 22, 38) and estimate the net or effective fractionation factor from 
in situ pore water data. For example, we estimate ac, the effective fractionation 


factor for C in CHy, by applying equation (2), setting Sproduct = 8°Ccu, and 
Ssource = 8'?Cco2, where 8° CoH, and 8° Cou, are the observed C compositions 
of CH, and CO,, respectively**. Using CO, isotope composition for d.ource follows 
directly for hydrogenotrophic methanogenesis (for which CO, is the source C 
substrate) and has been found to work also in practice for acetoclastic methano- 
genesis, because porewater CO; arises primarily from respiration of organic matter 
(a non-discriminatory reaction), and so is typically isotopically indistinguishable 
from organic matter’. 

Autochamber measurements. The autochamber system at Stordalen mire has 
previously been described in detail for measurements of CO, and total hydrocar- 
bons'*“°. In brief, a system of eight automatic gas-sampling chambers made of 
transparent Lexan was installed in the three habitat types at Stordalen mire in 2001 
(n = 3 each in the palsa and Sphagnum habitats, and n = 2 in the Eriophorum 
habitat). Each chamber covers an area of 0.14 m? (38. cm X 38 cm), witha height of 
25-45 cm, and is closed once every 3h for a period of 5 min. The chambers are 
connected to the gas analysis system, located in an adjacent temperature controlled 
cabin, by 3/8-inch Dekoron tubing through which air is circulated at approxi- 
mately 2.51 min~'. During the 2011 season the system was updated with a new 
chamber design similar to that described in ref. 41. The new chambers each cover 
an area of 0.2 m? (45cm X 45cm), with a height ranging from 15 to 75 cm depend- 
ing on habitat vegetation. At the Palsa and Sphagnum site the chamber base is flush 
with the ground and the chamber lid (15 cm in height) lifts clear of the base between 
closures. At the Eriophorum site the chamber base is raised 50-60 cm on Lexan 
skirts to accommodate vegetation of large stature. In addition, each chamber is 
instrumented with thermocouples measuring air and surface ground temperature, 
and the depth of the water table is measured manually three to five times per week. 
The Palsa site chambers are located within the palsa site in ref. 6 and correspond to 
the hummock site class (I) described in ref. 4. The Sphagnum site chambers are 
located within the bog site in ref. 6 or site S in ref. 9 and correspond to the semi-wet 
and wet site classes (II and III) described in ref. 4. The Eriophorum site chambers 
are located within the fen site in ref. 6 or site E in ref. 9 and correspond to the tall 
graminoid site class (IV) described in ref. 4. 

Quantum cascade laser spectrometer measurement and calibration. Methane 
fluxes and isotopes were measured with a quantum cascade laser spectrometer (QCLS; 
Aerodyne Research Inc.), deployed to Stordalen mire in June 2011. The QCLS 
instrument at Stordalen is a modification of the technology described in detail in 
ref. 42. In brief, the QCLS uses a room-temperature continuous-wave mid-infrared 
laser whose frequency was tuned to scan rapidly (900 kHz) across '*CH, and CH, 
absorption lines in the 7.5-1m region. The laser light enters a multipass sample cell 
(effective path length about 200 m) containing sample air at low pressure (about 
5 kPa) and is detected by a thermoelectrically cooled detector (no cryogens are needed). 
Aerodyne Research’s custom TDL Wintel software averages high-frequency spec- 
tra to produce independent '*CH, and '*CH, mixing ratios in the sample air- 
stream at 1-s intervals. The ratio R of “CH, to "CH, can then be expressed in 
standard notation as 81°C, the part-per-thousand (%bo) deviation of the measured 
ratio from the VPDB standard °C/'’C ratio Ryppp, as in equation (1). 

Instrument precision in the field at Stordalen mire was assessed by using time- 
series measurements of calibration tank air over 30-40 min. The precision of 5'°C- 
CH, measurements using a 1-s integration time was 1%o. The Allan variance technique 
(used to characterize the minimum possible measurement error and the averaging 
time required to achieve it), showed that the minimum measurement error on 
5'°C-CH, was less than 0.2%o, achieved with 60 s of averaging time. This approaches 
the precision of comparable measurements made with gas chromatography-isotope 
ratio mass spectrometry (GC-IRMS). 

We connected the QCLS to the main autochamber circulation with /%-inch 
Dekoron tubing and a solenoid manifold that enabled selection between the auto- 
chamber flow and an array of calibration tanks. During measurement periods, filtered 
(0.45 jum Teflon filter) and dried (Perma Pure PD-100T-24MSA) sample air flows 
at 1.4 standard litres per minute through the 2-litre QCLS sample cell volume at 
5.6 kPa. A downstream solenoid controls the QCLS return flow so that air recir- 
culates only during autochamber measurement periods; during calibration periods, 
exhaust air is vented to the room. 

Calibrations were performed every 60 min with three calibration gases spanning 
the observed concentration range (1.5-10 p.p.m.). The CH, concentration and 8'°C 
composition of each calibration tank was determined by inter-calibration with a set 
of four well-characterized primary standard tanks. The primary tanks (Scott Marin, 
Inc.) were calibrated to the VPDB scale by means of flask samples, which were 
analysed by GC-IRMS at Florida State University (see porewater methods for GC- 
IRMS details). Each isotopologue, CH, and CH, was treated as an independent 
measurement and calibrated separately. For each calibration period a linear calibration 
curve was fitted for each isotopologue and the fit parameters were then linearly 
interpolated between calibration periods. The interpolated fit parameters were applied 
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to the measured sample isotopologue mixing ratios to give calibrated measure- 
ments of '*CHy, ‘CH, and total CHy, from which 5'°C-CHy, was calculated. 
Autochamber data processing. For each autochamber closure we calculated the 
flux and 5’°C signature of emitted CH,, Fluxes were calculated by using a method 
consistent with that detailed in ref. 44 for CO, and total hydrocarbons, using 
a linear regression of changing headspace CH, concentration over a period of 
2.5 min. Eight 2.5-min regressions were calculated, staggered by 15 s, and the most 
linear fit (highest 7°) was then used to calculate flux. Keeling plots'**” using the 
entire closure period were used to estimate the isotopic composition of the emitted 
CH,. As demonstrated in ref. 42, negligible error in measurement of CH, relative 
to that of 5’°CH, for this instrumentation meant that typeI regression was suf- 
ficient for the Keeling plot analysis. When the total change in headspace CH, was 
low”, there was high error in the Keeling intercept. We used a threshold of 3%o 
uncertainty in the Keeling intercept as a cutoff for including isotopic values in the 
calculation of daily and annual averages, resulting in a total of 1,569 observations 
at the Sphagnum site and 1,168 at the Eriophorum site. No Palsa chamber closures 
had sufficient CH, flux to calculate 8'°CHy. Daily and whole-season average flux 
and isotopic composition for each habitat were calculated on the basis of indi- 
vidual chambers as the unit of replication (n = 3 for Palsa and Sphagnum, n = 2 
for Eriophorum). Significant differences in the magnitude and isotopic composi- 
tion of CH, emissions were determined with Student’s t-test (isotopic composition) 
and analysis of variance (flux magnitude) in R"’, with seasonal averages for each 
autochamber as the unit of replication. Statistical significance was determined at 
a= 0.05. 

Porewater sampling and analysis. Porewater samples were collected on 12 July 
2011, 15 August 2011 and 15 October 2011 at three locations adjacent to the Sphagnum 
and Eriophorum autochamber sites (Extended Data Table 1). Samples were col- 
lected by suction with a syringe through a stainless steel tube and filtered through 
25-mm diameter Whatman Grade GF/D glass microfibre filters (2-jum particle 
retention). Porewater pH was measured in the field (Oakton Waterproof pHTestr 
10; Eutech Instruments). Samples for the analysis of the concentration and 58°C of 
CH, and CO were injected into 30-ml evacuated vials sealed with butyl rubber 
septa and frozen within 8h of collection. The samples for 5D-CH, were injected 
into 120-ml evacuated vials sealed with butyl rubber septa and containing 0.5 g 
of KOH. For 6D-H,0, water was filtered directly into 2-ml plastic screw-cap vials 
so that the vials were completely filled, then frozen within 8h of collection. All 
samples were shipped frozen to Florida State University for analysis. 

Samples collected for analysis of CH, and CO, concentrations and 8'°C were 
thawed, acidified with 0.5 ml of 21% H3PO,, and brought to atmospheric pressure 
with helium. The sample headspace was analysed for concentrations and 8'°C of 
CH, and CO; ona continuous-flow Hewlett-Packard 5890 gas chromatograph (Agilent 
Technologies) at 40 °C coupled to a Finnigan MAT Delta S isotope ratio mass spec- 
trometer via a Conflo IV interface system (Thermo Scientific). The headspace gas 
concentrations were converted to porewater concentrations on the basis of their 
known extraction efficiencies, defined as the proportion of formerly dissolved gas 
in the headspace. An extraction efficiency of 0.95 (based on repeated extractions) 
was used for CHy, and the extraction efficiency for CO, relative to dissolved inor- 
ganic carbon (DIC) was determined on the basis of CO, extraction from dissolved 
bicarbonate standards”. 

Samples collected for analysis of 3D-CH, were brought to atmospheric pressure 
with helium and measured on a gas chromatograph connected to a ThermoFinnegan 
Delta Plus continuous-flow isotope ratio mass spectrometer at the National High 
Magnetic Field Laboratory (Tallahassee, FL). 5D of CH is affected by 5D of H2O 
because CH, exchanges H atoms with water during methanogenesis””**”°, so mea- 
surement of 5D-H;0O is necessary for the correct assignment of CH, production 
mechanisms and oxidation based on 8D and 81°C of CH,. Samples collected for 
5D-H20 were measured on an LGR DT-100 liquid water stable isotope analyser at 
Florida Agricultural and Mechanical University (Tallahassee, FL). Data analysis 
for these samples was performed with an MS Excel template from the IAEA Water 
Resources Programme (http://www.iaea.org/water). 

Significant differences in %<_ and 6D and 5'°C of porewater CH, between the 
Sphagnum and Eriophorum sites were determined with Student's t-test (a, 6D- 
CHy,, 8'°C-CHy,) and Hotelling’s t-test (multivariate 5D and 5C of CH,) in R*. 
Statistical significance was determined at « = 0.05. 

Peat sampling. Peat samples were collected on 12 July 2011, 16 August 2011 and 
16 October 2011 at three locations adjacent to the Palsa, Sphagnum and Eriophorum 
autochamber sites. For the Sphagnum and Eriophorum sites, samples were collected 
at the same depths and locations as those used for porewater sampling (Extended 
Data Table 1); sample depths for the Palsa site are detailed in ref. 6. Peat cores were 
collected with a push corer 11cm in diameter (Palsa and Sphagnum sites) or a 
10cm X 10cm Wardenaar corer (Eriophorum site). Cores were subsampled by 
depth and were subdivided in the field for microbial and chemical analysis, avoid- 
ing the outer 1cm of the core. Samples for microbial analysis were placed in 
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cryotubes, saturated with about 3 volumes of LifeGuard solution (MoBio Lab- 
oratories) and stored at —80° C until processing. Samples for chemical analysis 
were placed in plastic bags and frozen until processing. 

Peat chemical analysis. For peat %C, %N, C:N ratio and 81°C measurements, 
5-10 g of peat was dried at 60 °C until completely dry (3-10 days) and ground to a 
fine powder. Subsamples of ground peat (80-100 jig for %C and 8'°C analysis, and 
5-6 mg for %N analysis) were wrapped in tin capsules and analysed by combustion 
to CO, and N; at 1,020 °C in an automated CHN elemental analyser coupled with 
a ThermoFinnegan Delta XP isotope ratio mass spectrometer at the National High 
Magnetic Field Laboratory. Samples were run in non-dilution mode for carbon 
analysis and dilution mode (10) for nitrogen analysis. C:N was calculated as 
(%C)/(%N) (by weight) for corresponding pairs of subsamples. 

Small-subunit ribosomal RNA gene amplicon analysis. Sampling and extrac- 
tion was performed as described previously*. Several additional samples were ana- 
lysed for this paper; multiplex identifiers for those runs not reported in ref. 6 are 
provided in Extended Data Table 7. Small-subunit rRNA gene sequences were 
processed with APP 3.0.3 (https://github.com/Ecogenomics/APP). Homopolymer 
errors were corrected with Acacia”! and the resulting reads were processed by using 
the CD-HIT-OTU 0.0.2 pipeline with minor adjustments”. All reads were trimmed 
to 250 base pairs, and reads of less than 250 base pairs were discarded. Sequences 
were clustered at 97% identity and each cluster was assigned a taxonomy using 
BLASTN 2.2.22 (ref. 53) through the QUIME script assign_taxonomy.py™ against 
the GreenGenes October 2012 database clustered at 99% identity (Supplementary 
Table 1). The taxonomy of each methanogenic cluster was confirmed by using par- 
simony insertion in ARB**. Amplicon sequence clusters were identified as potential 
hydrogenotrophic or acetoclastic methanogens based on taxonomic relationship to 
known methanogenic lineages (Extended Data Table 2)”***°°. Within the order 
Methanosarcinales, lineages most closely related to Methanosaeta were classified as 
obligate acetoclasts, whereas those most closely related to Methanosarcina were con- 
sidered facultative acetoclasts, having the potential for both acetoclastic or hydro- 
genotrophic production”. 

Regression analysis. A stepwise regression approach with Akaike’s information 
criterion (AIC) as the model selection criterion was used to identify a subset of 
microbial and environmental predictor variables that best explained CH, meta- 
bolism patterns quantified as porewater o- (Extended Data Table 5). Model selec- 
tion was performed with the stepAIC package in R, and the relative importance of 
the predictor variables in the selected model was then calculated with the relaimpo 
R package**. Variables included in the model selection process included the rela- 
tive abundances of the six most abundant methanogen operational taxonomic units 
(comprising more than 93% of the total methanogen sequences; see Extended Data 
Table 2) plus soil temperature, water table depth, pH, porewater CH, and DIC 
concentration, and peat C:N, %C, %N and 313C (Extended Data Table 1). Strong 
correlation between pH and both water-table depth and peat 5'°C as well as peat 
%N and both %C and C:N meant that pH and %N were excluded from the regres- 
sion analysis. Removal of non-significant predictor variables (DIC and relative 
abundance of an unidentified Methanobacterium spp. (otu-3636; Extended Data 
Table 2)) hada minimal effect on the model AIC value (less than 1); this simplified 
version was therefore selected as the optimal model (model 2 in Extended Data 
Table 5). Stepwise regression was also performed with 5'*C-CH, as the dependent 
variable. This analysis resulted in a similar model outcome, but with a lower R 
(model 1 in Extended Data Table 8). Stepwise regression analysis with environ- 
mental predictor variables and the relative abundance of the influential methano- 
gen ‘M. stordalenmirensis’ (otu-10747) as the dependent variable showed that pat- 
terns in this methanogen’s abundance were influenced by environmental conditions, 
particularly water table depth and peat chemistry (model 2 in Extended Data Table 8). 
However, these environmental variables alone cannot fully replace microbial data 
when modelling ~¢. Stepwise regression analysis using only environmental vari- 
ables to predict wc yielded a model with a lower AIC and R* (model 3 in Extended 
Data Table 8). It is the combination of methanogen and environmental variables 
that yields a model that explains the most variability in ~~ (Extended Data Table 5). 
Box model of atmospheric methane. The model used here was a one-box model 
simplified from the two-box model of ref. 57 (and also used in the methane inver- 
sion study”*): 


ae = Fou, —AM (3) 


ai) = Ren, Fou, = donA(RM) 


where M is the mixing ratio (in p.p.b.v.) of CH, in the atmosphere, Fou, is the 
source flux of CH, to the atmosphere, 4 is the atmospheric removal rate (1/9 yr~', 
assumed for this illustration to be fixed), the R terms are the ratio of ‘*CH, to 


™CH,, as defined for equation (1), and ay, is the isotopic fractionation (0.994, or 
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about —6%po) for the atmospheric oxidation of CH, by OH (ref. 28). Baseline flux 
to the atmosphere (Fop,) was set to 559 Tg CHy, the 1980 value’®. The isotopic 
composition of CH, inputs to the atmosphere (Rcu, ) was set to the equivalent of 
—53%p to allow steady-state modern atmospheric CH, to have the observed value 
of about —47%p. 

We implemented this model numerically in the R software package“, simulat- 
ing the effect on the atmosphere of CH, emission due to permafrost thaw and 
partial decomposition of the 1,700 PgC stock of permafrost C expected over the 
next 300 years, as summarized in refs 1, 2. High and low permafrost carbon release 
scenarios for both the high climate change scenario (Intergovernmental Panel on 
Climate Change (IPCC) scenario RCP8.5, leading to the release of 120-195 Pg C) 
and the low climate change scenario (IPCC scenario RCP2.6, approximated as 
one-third of the C release of the high scenario) (Extended Data Fig. 2a) generated 
CH, emissions (Fig. 3a) (based on 2.3% of released permafrost carbon emerging 
as CH, (ref. 2)) and corresponding impacts on the atmospheric concentrations of 
CH, (Extended Data Fig. 2b). We simulated the impacts of these emissions on the 
isotopic composition of atmospheric CH, by assuming that the 5'°C of CH, emitted 
was in the range of what we report here for Stordalen mire, from very light (— 80%o, 
like that measured at the Sphagnum site) to only moderately light (— 65%o, like that 
measured at the Eriophorum site), giving a range of isotopic perturbations to 
atmospheric CH, under high climate change (Extended Data Fig. 2c) and under 
low climate change (Extended Data Fig. 2d). In all scenarios, the induced change in 
atmospheric 5'°C is significantly larger than the atmospheric detection limit of 
0.1%o (reported in ref. 28 and shown as a dotted horizontal line in Extended Data 
Fig. 2c, d). 

For the analysis shown in Fig. 3 we focused on a mid-range value of permafrost 
Crelease (high climate change scenario with low C release, 120 Pg total C by 2100), 
corresponding to emissions of 2.8 Pg C as CH, by 2100 (the dashed black and red 
line in Fig. 3a). (By comparison, the IPCC estimates that up to 5 PgC may be 
released as CH, by 2100 (ref. 3).) We explored the misattribution of C release that 
would occur, by (mistakenly) assuming that the isotopic composition of emitted 
CH, was in the range of assumptions used in previous atmospheric inversions, 
from —60%p to —65%o (ref. 28), instead of the range measured at Stordalen mire 
(—65%0 to —80%o0). We estimated the magnitude of misattribution (or error flux; 
Fig. 3c) by simulating the amount of additional carbon that would need to be 
released (at nominally assumed isotopic composition values of —60 or —65) to 
have the same effect on atmospheric composition as the carbon released under 
scenarios with isotopic compositions like those observed in the field. 
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Extended Data Figure 1 | Expected and observed relationships between the 
5D and 5'°C content of porewater CHy. The thick grey arrow shows the 
expected pattern in H and C isotopes of CH, when variations are caused by 
shifts between acetoclastic (lower right) and hydrogenotrophic (upper left) 
production. The thin black arrows pointing to the upper right indicate the 
expected pattern in H and C isotopes of CH, when variations are caused by 
changes in CH, oxidation”’. The points are observed isotopic compositions of 
samples collected between July and October 2011 at the partly thawed 
Sphagnum and fully thawed Eriophorum sites; site averages are shown with 
error bars (error bars represent s.e.m.; n = 13 (Sphagnum) and 20 
(Eriophorum)). Although the scatter allows for some variation in both 


+ Sphagnum [A July 4 August * October] 
+ Eriophorum [© July # August * October] 


-65 -60 -55 -50 


production and oxidation, the average Eriophorum porewater CH, had 
significantly more '*C and less D relative to Sphagnum porewater (Hotelling’s 
test, P = 0.0001, n = 33), indicating that the overall inter-site isotopic 
differences were due mostly to differences in the CH, production pathway 
rather than to differences in CH, oxidation. Additionally, in August there was a 
significant negative relationship between 5'*C-CH, and 5D-CH, of porewater 
samples collected across sites (dashed line, linear regression, R’ =0.5, P<0.02, 
n= 12). Note that on the vertical axis 5D-H,O has been subtracted from 
5D-CH, to correct for the effect of 3D exchange between H,O and CHy 
(refs 20, 38, 50). 
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Extended Data Figure 2 | Simulations, using high and low temperature and 
C release scenarios, of the effect of CH, release from thawing permafrost on 
atmospheric 5'°C-CH,. a, Scenarios of permafrost C release due to thaw (red 
bounding lines, high temperature; orange bounding lines, low temperature; 
the range in each case is defined by high and low C release scenarios). b, Impact 
on atmospheric methane mixing ratios (assuming that 2.3% of released C is 
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emitted as methane). c, Impact of the high climate change scenario on 
atmospheric methane isotopes, assuming Eriophorum-like emissions (blue 
bounding lines, 8'°C ~ —65%o), or assuming Sphagnum-like emissions (green 
bounding lines, 513C = —80%o). d, As in c, except for the low climate change 
scenario. In c and d, dotted horizontal lines indicate the detection limit for CH, 
isotopes”. 
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Extended Data Table 1 | Summary of porewater chemistry, average (s.e.m.), n = 3 


Depth 


Sample cu) pH mM COz MMCH, = 8""CO2%. 8 "CHa %o ae 
July, 2011 
Sphagnum - M 13 4.1 (0.06)  3.02(0.78) 0.09(0.04) -15.7(1.6) -62.2(3.8) 1.050 (0.005) 
Sphagnum - D 19 4.2(0.09)  3.50(0.57) 0.15(0.05) -14.1(0.6) -62.2(4.5) 1.051 (0.005) 
Eriophorum - S 3 5.8(0.09) 2.29(0.92) 0.18(0.12) -14.1 (1.1) -52.1(0.5) 1.040 (0.001) 
Eriophorum -M 7 5.6 (0.06) 3.06(0.77)  0.28(0.07) -12.9(1.0) -52.6(0.6) 1.042 (0.001) 
Eriophorum - D 24 5.6 (0.03) 3.56 (0.80)  0.36(0.07) -11.6 (1.7) -53.3(1.9) 1.044 (0.004) 
August, 2011 
Sphagnum - M 21 4.2(0.10)  4.89(0.37)  0.23(0.04) -12.0(1.5) -66.7(5.7) 1.059 (0.008) 
Sphagnum - D 26 4.1 (0.13)  4.80(0.48) 0.23 (0.04) -10.7(1.6) -69.9(4.6) 1.064 (0.007) 
Eriophorum - S 3 5.7(0.19) 1.62 (0.28) 0.06 (0.04) -13.5(0.5) -60.0(2.6) 1.049 (0.003) 
Eriophorum -M 7 5.7 (0.10)  1.93(0.25)  0.10(0.02) -13.9(0.4) -56.6 (2.1) 1.045 (0.002) 
Eriophorum - D 26 5.6(0.15) 3.58(0.62) 0.31 (0.11) -11.1(2.4) -55.9(1.1) 1.047 (0.001) 
October, 2011 
Sphagnum - M 10 4.3 (0.06)  1.24(0.42) 0.03(0.02) -16.4(1.6) -59.2(6.5) 1.046 (0.006) 
Sphagnum - D 15 4.5(0.10) 3.21 (0.90) 0.10(0.04) -13.8(2.4) -61.5(2.7) 1.051 (0.0004) 
Eriophorum - S 3 5.9(0.15) 2.15 (1.43) 0.19(0.13) -14.1 (1.0) -56.4(2.4) 1.045 (0.001) 
Eriophorum -M 7 5.9(0.15) 2.71 (1.25) 0.29(0.14) -13.7(1.6) -57.8(3.1) 1.047 (0.002) 
Eriophorum - D 26 5.7(0.12) 3.84(1.64) 0.53 (0.27) -11.3(3.1) -58.1 (2.2) 1.050 (0.001) 
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Extended Data Table 2 | Relative abundance, taxonomic classification and predicted methanogenic pathway of the dominant methanogen 
operational taxonomic units (OTUs) 


Meinanoforens  Metarobasterum — reinanoregua  Melhanesagna  Mothanosssia  Methangsasia 
Sample (otu-10747) (otu-20819) 
Hydrogenotrophic Hydrogenotrophic Hydrogenotrophic (ecutaine) ablicatey. Soelceey 
July, 2011 
Palsa-—S 0.0 0.0 0.0 0.0 0.0 0.0 
Palsa—M 0.0 0.0 0.0 0.0 0.0 0.0 
Palsa — D 0.0 0.4 0.0 0.0 0.0 0.0 
Sphagnum — S$ 0.3 0.4 0.0 0.1 0.0 0.0 
Sphagnum —M 4.0 12.9 0.0 3.4 0.0 0.0 
Sphagnum — D 16.4 5.8 0.0 3.3 0.0 0.0 
Eriophorum — S 1.0 2.7 5.8 0.7 4.5 1.8 
Eriophorum — M 5.3 3.7 4.0 2.2 5.0 2.7 
Eriophorum — D 8.3 1.6 1.9 0.6 4.2 1.2 
August, 2011 
Palsa-—S 0.0 0.0 0.0 0.0 0.0 0.0 
Palsa—M 0.0 0.0 0.0 0.0 0.0 0.0 
Palsa — D 0.0 0.0 0.0 0.0 0.0 0.0 
Sphagnum — S 0.1 0.4 0.0 0.2 0.0 0.0 
Sphagnum —M 11.6 4.0 0.0 1.9 0.0 0.0 
Sphagnum — D 32.1 3.1 0.0 1.4 0.0 0.0 
Eriophorum — S 0.6 2.1 3.6 0.4 3.3 1.0 
Eriophorum —M 6.3 6.1 5.1 2.6 9.0 3.9 
Eriophorum — D 6.5 0.3 3.4 1.2 1.7 0.6 
October, 2011 
Palsa-S 0.0 0.0 0.0 0.0 0.0 0.0 
Palsa—M 0.1 11 0.0 0.1 0.0 0.0 
Palsa — D 0.1 0.7 0.0 0.0 0.0 0.0 
Sphagnum — S 0.0 0.1 0.0 0.0 0.0 0.0 
Sphagnum —M 0.0 3.4 0.0 1.1 0.0 0.0 
Sphagnum — D 0.6 8.4 0.0 1.2 0.0 0.0 
Eriophorum — S 2.5 7 1.7 0.6 1.4 0.6 
Eriophorum — M 2.1 1.9 1.0 0.8 2.5 2.2 
Eriophorum — D 6.0 1.1 3.7 0.1 5.1 5.8 
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Extended Data Table 3 | Relative abundance of methanogen functional groups within the Archaea 


: F Acetoclastic Acetoclastic 
Site Hydrogenotrophic (facultative) (obligate) Other Archaea 
July, 2011 

Palsa 35.9 2.9 0.0 61.2 

Sphagnum (aerobic)™ 83.1 15.5 0.0 1.4 

Sphagnum (anaerobic)* 82.1 14.2 0.0 3.8 

Eriophorum 39.5 4.2 21.4 34.9 
August, 2011 

Palsa 0.0 8.7 0.0 91.3 

Sphagnum (aerobic) 68.2 30.7 0.0 1.1 

Sphagnum (anaerobic)* 91.2 6.1 0.0 2.8 

Eriophorum 39.5 5.1 21.9 33.5 
October, 2011 

Palsa 56.5 2.6 0.4 40.5 

Sphagnum (aerobic)™ 65.7 24.0 0.7 9.6 

Sphagnum (anaerobic)* 15.6 2.8 2.6 79.0 

Eriophorum 35.8 2.4 27.6 34.2 


* Above the water table. 
+ Below the water table. 
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Extended Data Table 4 | Results of linear regression analysis for predicting ac from the relative abundances of methanogenic pathways, 
dominant methanogenic lineages and environmental variables (n = 41) 


Variable R? F-statistic p-value 

‘M. stordalenmirensis’ 0.58 54.09 <0.001 
otu-3636* 0.00 0.01 0.926 
otu-10220* 0.12 5.36 0.026 
otu-20819 * 0.15 6.82 0.013 
otu-15150 * 0.06 2.27 0.140 
otu-7308 * 0.01 0.32 0.576 
Hydrogenotrophic 0.44 30.63 <0.001 
Acetoclastic (obligate) 0.12 5.23 0.028 
Water table depth 0.44 31.1 <0.001 
pH 0.19 8.97 0.005 
Porewater CH, (mM) 0.00 0.07 0.796 
Porewater DIC (mM) 0.25 13.33 0.001 
Peat C:N 0.00 0.17 0.682 
Peat %C 0.02 0.75 0.393 
Peat %N 0.00 0.14 0.709 
Peat 8'°C 0.13 5.99 0.019 


* See Extended Data Table 2 for taxonomic details. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 5 | Results of stepwise multiple regression analysis for predicting ac from relative abundances of methanogenic 
lineages and environmental variables 


Variable Coefficient Std Error t value p value Cumulative AIC 


Model 1 - stepwise regression, direction = both 
(R® = 0.81, F = 23.71 on 6 and 34 df, p <0.001) 


Water table depth -0.0004 0.0001 -5.398 <0.001 -422.33 
‘M. stordalenmirensis’ 0.0271 0.0084 3.221 0.002 -436.79 
C:N -0.0002 0.0001 -2.872 0.007 -438.80 

Peat 8'°C 0.0014 0.0006 2.516 0.017 -440.71 

DIC (mM) 0.0007 0.0005 1.396 0.171 -445.42 
otu-3636* -0.0271 0.0161 -1.345 0.188 -445.58 
Intercept 1.089 0.0167 65.193 <0.001 -445.71 


Model 2 — significant predictor variables from model 1 
(R° = 0.79, F = 33.71 on 4 and 36 df, p <0.001) 


Water table depth -0.0004 0.0001 -5.202 <0.001 -425.11 
‘M. stordalenmirensis’ 0.0351 0.0072 4.867 <0.001 -427.36 
C:N -0.0002 0.0001 -2.613 0.013 -440.97 

Peat 8'°C 0.0014 0.0006 2.470 0.018 -441.67 
Intercept 1.089 0.0164 66.583 <0.001 -446.09 


* See Extended Data Table 2 for taxonomic details. 
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Extended Data Table 6 | Estimate of the relative contribution of hydrogenotrophic production to annual CH, emission at Stordalen mire 


Habitat Area Annual Flux Annual Emission Estimated Emission from | : 
(ha)* (g CH4 m*) (kg CHa) * Hydrogenotrophy (kg CHg yr’) 
Sphagnum 6.2 6.2 288.3 247.9% - 282.5 
Eriophorum 2.0 36.0 540.6 172.8'— 335.2" 
Total 828.9 420.7(51%) — 617.7 (75%) 


* Based on ref. 4; the Sphagnum site in this study is representative of the semi-wet and wet vegetation classes. 

+ Annual total hydrocarbon emissions from ref. 16 corrected for non-methane volatile organic compound (NMVOC) flux using the reported proportions (25% NMVOC for the Eriophorum site; 15% for the Sphagnum 
site). The magnitude of growing season CHa emissions measured in this study is comparable to the growing season CH, flux used in the estimate of annual flux in ref. 16. 

{Two approaches: isotopic, using mixing of acetoclastic (—60%0) and hydrogenotrophic (—80%o) sources to yield mean emitted 8'3C-CH,, and molecular, using the proportion of the methanogen community 
identified as hydrogenotrophic. 

§ Molecular approach: on average 86% of methanogen community in the anoxic CH,-producing peat was identified as hydrogenotrophic; all of the acetoclasts were facultative so this is probably an underestimate 
of potential hydrogenotrophic production. 

\|lsotopic approach: —79.6% = —80% x 0.98 + —60% x 0.02 (the bold number indicates the proportion of CH4 produced by hydrogenotrophy that would produce the measured 81°C-CHa). 

{Isotopic approach: —66.3%0 = —80% x 0.32 + —60%o x 0.68 (the bold number indicates the proportion of CH4 produced by hydrogenotrophy that would produce the measured 8'3C-CHa). 

#Molecular approach: on average 62% of the methanogen community was identified as hydrogenotrophic. 
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Extended Data Table 7 | Small-subunit rRNA gene amplicon multiplex identifiers (MIDs) used for each sample 


Sample name Run # Multiplex identifier (MID) 
20110712 _E 3M 6 CGAGC 
20110712 _S 1M 6 CGCAT 
20110712_S 3 M 6 CGTAC 
20110712_P_1_S 6 CGTGT 
20110712 _P2S 6 CTAGT 
20110712_P_3.S 6 CTGAC 
20110816_S 2S 6 TACGC 
20110816_S_ 1_D 6 TATGT 
20110816_P_1_M 6 TCAGT 
20111016_P_1_S 6 TCGAT 


* Sample names are composed of the date of sampling, followed by P, S or E for Palsa, Sohagnum or Eriophorum sites, respectively; the number indicates the core within the site, and S, M or D indicates surface, 
middle or deep sampling within the core, respectively. 


+ Samples were multiplexed in six separate runs, each time with samples not related to this study. The multiplex identifiers of the first five runs are given in ref. 6. 
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Extended Data Table 8 | Results of stepwise multiple regression analysis for predicting 57°C-CH, from relative abundances of methanogenic 
lineages and environmental variables (model 1), the relative abundance of ‘M. stordalenmirensis’ from environmental variables (model 2), 
and ac from environmental variables (model 3) 


Variable Coefficient Std Error t value Cumulative AIC 


Model 1 - stepwise regression, dependent variable = 3'°C-CH,, direction = both 
(R? = 0.75, F = 21.25 on 5 and 35 df, p <0.001) 


Water table depth 0.299 0.07 4.512 <0.001 
‘M. stordalenmirensis’ -23.25 6.79 -3.426 0.002 
Peat 5'°C 151 0.54 -2.779 0.009 

CH, (mM) 10.60 4.12 2.576 0.014 

C:N 0.12 0.05 2.149 0.039 
Intercept -102.14 15.23 -6.705 <0.001 


Model 2 - stepwise regression, dependent variable = ‘M stordalenmirensis’, direction = both 
(R? = 0.53, F = 7.77 on 5 and 35 df, p <0.001) 


Water table depth -0.0053 0.0015 -3.634 <0.001 
C:N -0.0035 0.0010 -3.495 0.001 

DIC (mM) 0.0214 0.0106 2.025 0.050 
%C 0.0033 0.0018 1.799 0.081 

Soil temperature 0.0059 0.0040 1.483 0.147 
Intercept -0.0558 0.0805 -0.692 0.493 


Model 3 - stepwise regression, dependent variable = aC, direction = both 
(R° = 0.71, F = 21.71 on 4 and 36 df, p <0.001) 


Water table depth -0.0005 0.0001 -6.465 <0.001 -402.97 
C:N -0.0003 0.0001 -4.514 <0.001 -416.18 

DIC (mM) 0.0015 0.0006 2.629 0.013 -427.36 
Peat 8'°C 0.0017 0.0007 2.574 0.014 -427.63 
Intercept 1.0990 0.0192 57.396 <0.001 -432.56 
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Limited impact on decadal-scale climate change 
from increased use of natural gas 


Haewon McJeon!, Jae Edmonds!, Nico Bauer’, Leon Clarke!, Brian Fisher’, Brian P. Flannery‘, Jerome Hilaire’, Volker Krey”, 
Giacomo Marangoni°®, Raymond Mi*, Keywan Riahi’, Holger Rogner® & Massimo Tavoni® 


The most important energy development of the past decade has been 
the wide deployment of hydraulic fracturing technologies that enable 
the production of previously uneconomic shale gas resources in North 
America’. If these advanced gas production technologies were to be 
deployed globally, the energy market could see a large influx of eco- 
nomically competitive unconventional gas resources’. The climate 
implications of such abundant natural gas have been hotly debated. 
Some researchers have observed that abundant natural gas substitu- 
ting for coal could reduce carbon dioxide (CO,) emissions**. Others 
have reported that the non-CO, greenhouse gas emissions associated 
with shale gas production make its lifecycle emissions higher than 
those of coal’*. Assessment of the full impact of abundant gas on 
climate change requires an integrated approach to the global energy- 
economy-climate systems, but the literature has been limited in either 
its geographic scope” or its coverage of greenhouse gases”. Here we 
show that market-driven increases in global supplies of unconven- 
tional natural gas do not discernibly reduce the trajectory of greenhouse 
gas emissions or climate forcing. Our results, based on simulations 
from five state-of-the-art integrated assessment models" of energy- 
economy-climate systems independently forced by an abundant gas 
scenario, project large additional natural gas consumption of up to 
+170 per cent by 2050. The impact on CO, emissions, however, is 
found to be much smaller (from —2 per cent to +11 per cent), anda 
majority of the models reported a small increase in climate forcing 
(from —0.3 per cent to +7 per cent) associated with the increased use 
of abundant gas. Our results show that although market penetration 
of globally abundant gas may substantially change the future energy 
system, it is not necessarily an effective substitute for climate change 
mitigation policy®”’. 

Five research teams projected the evolution of the future global energy 
system up to 2050 under two alternative assumptions about natural gas 
supply: ‘Conventional Gas’ and ‘Abundant Gas’ (Fig. 1 and Methods). 
Each natural gas supply curve was constructed based on the synthesis 
of natural gas supply and geographic distribution in the Global Energy 
Assessment (GEA) report’. 

The Conventional Gas scenario assumes the maximum recoverable 
resources to be 11,000 exajoules (EJ) in 2010, a total consistent with con- 
ventional resources that have extraction costs below $3 per gigajoule 
(GJ). (One EJ equals one quintillion (10'8) joules and one GJ equals one 
billion (107) joules.) This supply curve reflects an estimate of econom- 
ically recoverable gas consistent with technology available before the 
shale gas revolution. 

The Abundant Gas scenario is characterized by both the global abun- 
dance of natural gas resources and substantially reduced extraction costs. 
This scenario envisions that advanced natural gas extraction technolo- 
gies become globally applicable beyond North America, allowing extrac- 
tion of previously uneconomic unconventional resources. To represent 
this scenario, we assumed that technological change halves the extraction 


cost in GEA between 2010 and 2050, allowing more than 30,000 EJ of 
cumulative natural gas to be produced at or below $3 per GJ, with addi- 
tional resources producible at higher prices. This rate of cost reduction is 
on the higher end compared to other studies*”””. This scenario is designed 
to provide a potential upper bound on global gas supply and should not 
be interpreted as the most likely case (see Methods for a broader range 
of supply assumptions). Furthermore, this rate of cost reduction is more 
aggressive than that of most low-carbon energy sources against which 
natural gas is competing (see Extended Data Table 1 for variance across 
the models). 

For both scenarios, we did not simulate future climate policies beyond 
those already in effect. The two scenarios therefore explore the degree 
to which market penetration of abundant gas alone can mitigate green- 
house gas emissions (see Methods). 

Five integrated assessment models ([AMs) are employed in this study: 
BAEGEM”’, GCAM™, MESSAGE”, REMIND” and WITCH”. These 
IAMs belong to a class of models designed to assess the implications of 
changes in the global energy system on climate forcing. They have been 
used extensively to project emission scenarios for global and regional 
assessments. For example, GCAM and MESSAGE provided two of the 
four Representative Concentration Pathways (RCPs) used in the Inter- 
governmental Panel on Climate Change’s Fifth Assessment Report’*”. 

The models integrate energy, economy, and climate systems to assess 
their interaction in a consistent framework. All models feature explicit 
representation of energy markets with price-responsive demand and 
supply for coal, oil, and gas, as well as for low-carbon energy sources”, 
The capability to simulate the effects of price changes on the scale and 
the composition of the future energy system is crucial for this study, 
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Figure 1 | Global natural gas supply curves in 2050. The current natural gas 
supply curves provided by Global Energy Assessment’”. Future cost reduction 
assumptions are documented in the Methods. These supply costs are not the 
actual prices in the market place. The costs do not include taxes or royalties, nor 
do they include external environmental or social costs associated with gas 
production”. $3 per GJ is equivalent to $3.2 per mmBtu. (One mmBtu is one 
million British thermal units.) US dollars at 2007 constant prices. 
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5International Institute for Applied Systems Analysis, Schlossplatz 1, A-2361 Laxenburg, Austria. Centro Euromediterraneo sui Cambiamenti Climatici and Politecnico di Milano, Via Lambruschini 4b, 
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because these effects determine the changes in emissions and corre- 
sponding changes in climate forcing. 

The models are harmonized to share common natural gas supply curve 
assumptions, but otherwise differ widely in model architecture, geospa- 
tial resolution, socioeconomic assumptions, and technology projections 
(see Table 1 and Methods for a detailed description of the model differ- 
ences and similarities). To the extent that a similar result is produced by 
this diverse set of models, we are more confident that the result is not 
simply an idiosyncratic artefact of an individual modelling method, but 
rather is reflective of more fundamental forces. 

The models independently projected the future energy system for the 
two natural gas supply scenarios. All five models reported that the abun- 
dant gas supply leads to additional gas consumption, as well as addi- 
tional gas-fired electricity consumption, compared to the Conventional 
Gas scenario. However, the speed of divergence and the size of the dif- 
ference in gas consumption varied across models: from 11% in WITCH 
to 170% in REMIND in 2050 (Fig. 2a). The models agreed on the pat- 
tern of sector penetration. Power production showed the largest shift 
towards gas substituting for all other fuels, most prominently coal. Smaller 
shifts occurred in industry and buildings (Fig. 3a). The models also agreed 
that natural gas continues to have a minor role only in transportation. 

Despite major changes to the global energy system and the substan- 
tial increase in natural gas consumption, the models agreed that addi- 
tional supply of natural gas in the energy market does not discernibly 
reduce fossil fuel CO, emissions. Future CO, emissions are similar in 
magnitude with and without abundant gas, as the two emission trajec- 
tories continue to rise over time at similar rates (Fig. 2b). For GCAM, 
MESSAGE, and WITCH, the CO; emissions for both scenarios were 
within 2% of each other in 2050. The BAEGEM (11%) and REMIND 
(5%) models showed larger differences, but emissions increased—rather 
than decreased—under the Abundant Gas scenario. 

The results demonstrate that abundant gas will not necessarily reduce 
CO, emissions. There are two forces at work: substitution and scale effect. 
First, additional natural gas consumption largely substitutes for coal, but 
not exclusively. All five models found that gas substitutes for all other 
primary fuels—such as nuclear and renewables—although coal loses 
the largest market share in all models (Fig. 3a). In 2050, abundant gas 
on average substitutes for 18% of coal and 17% of low-carbon energy 
(10% and 8% respectively for the 2010-2050 cumulative total). Hence, 
the effect of natural gas on CO, emissions is not based on the difference 


Table 1 | Overview of the five modelling systems 
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between the emission factors of gas and of coal, but on the emission 
factor of gas relative to that of a broader basket of energy sources. The 
natural gas emission factor (56 kg of CO; per GJ) is about half of the 
coal emission factor (96 kg of CO, per GJ)”". However, it is not substan- 
tially lower than the average global CO emissions per unit of energy: 
68 kg of CO per GJ (2050 model average). Consequently, even if natural 
gas were to substitute for the entire global energy supply, CO2 emissions 
would decline by a maximum of 20%. Considering the model average of 
a 36% share for natural gas in the global energy system in 2050, the actual 
emission reduction effect would be a fraction of the maximum. 

Second, lower natural gas prices accelerate economic activity, reduce 
the incentive to invest in energy-saving technologies, and lead to an 
aggregate expansion of the total energy system: a scale effect. All models 
reported greater total global primary energy consumption (6% on aver- 
age) in the Abundant Gas scenario compared with the Conventional 
Gas scenario. All else being equal, increased energy use leads to increased 
CO, emissions. All models reported that the combined effect of the two 
forces—substitution and scale effect—does not result in a discernible reduc- 
tion in emissions and, in some cases, leads to increased CO, emissions. 

The emissions data from the models were processed through a simple 
climate model, MAGICC6 (Model for the Assessment of Greenhouse- 
Gas Induced Climate Change), to assess the combined effects of all green- 
house gases and climate forcing agents (see Methods)”. The results echoed 
those that were observed for CO, emissions: climate forcing and asso- 
ciated temperature change are not discernibly reduced under the Abundant 
Gas scenario (Fig. 2c, d and Fig. 3b, c). Four models that endogenously 
model fugitive methane emissions reported increased climate forcing 
with abundant gas. This is largely driven by increased forcing from fugi- 
tive methane emissions associated with increases in gas consumption. 
The WITCH model, with exogenously specified methane emissions, 
reported virtually no change in forcing (—0.3%). 

Furthermore, four models reported the net change in forcing to be 
less than 3%. REMIND reported radiative forcing increase of 7%; more 
than half of that increase came from reductions in coal use and asso- 
ciated aerosol emissions (reduced cooling). Two other models that also 
simulate aerosol emissions endogenously (GCAM and MESSAGE), also 
reported a reduced cooling effect from aerosols, but at a smaller scale. 

The core finding of this research is that increases in unconventional 
gas supply in the energy market could substantially change the global 
energy system over the decades ahead without producing commensurate 


Model 


BAEGEM 


GCAM 


MESSAGE 


REMIND 


WITCH 


Full name 


Institutional steward 


Location 


Brief description 


Climate model used 
for this study 


Detailed description 


BAEconomics General 
Equilibrium Model 


BAEconomics 


Kingston, Australia 


BAEGEM is a global 
dynamic-recursive, 
multi-region, multi-sector, 
computable general 
equilibrium model; it 
includes energy use, 
transformation and 
technology detail 


MAGICC 6.0 

(natively integrated with 
MAGICC 5.3) 

Ref. 13 


Global Change 
Assessment Model 


Pacific Northwest 
National Laboratory 
(PNNL) 
College Park, 

Maryland, USA 

GCAM is a long-term, 
global, dynamic-recursive, 
integrated assessment 
model of human and 
physical Earth systems, 
including 14 geopolitical 
and 151 land-use 
regions; it includes 
detailed technological 
representations for 
energy, land use, and 

the economy 


MAGICC 6.0 
(natively integrated 
with MAGICC 5.3) 
Ref. 14 


Model for Energy Supply 
Systems And their 
General Environmental 
impact 

nternational Institute 
for Applied Systems 
Analysis (IIASA) 
Laxenburg, Austria 


MESSAGE is an 
integrated assessment 
modelling framework, 
combining a global 
(multi-region, multi- 
sector) systems 
engineering, 
inter-temporal 
optimization model, 
an aggregated macro- 
economic model, and a 
simple climate model 


MAGICC 6.0 
(natively integrated 
with MAGICC 5.3) 
Ref. 15 


Regional Model of 
nvestments and 
Development 


Potsdam Institute for 
Climate Impact 
Research (PIK) 
Potsdam, Germany 


REMIND is a multi- 
regional, general 
equilibrium model of 
the global economy, 
energy, and climate 
systems; it includes 
energy supply, 
transformation 
technologies and 
demand details; 
intertemporal 
optimization methods 
solve for the equilibrium 
MAGICC 6.0 


Ref. 16 


World Induced Technical 
Change Hybrid 


Centro Euromediterraneo 
sui Cambiamenti Climatici 
(CMCC) 

Milan, Italy 


WITCH is a multi-region, 
long-term, dynamic 
optimization, economy- 
energy-climate model, 
characterized by 
endogenous technological 
change and a game 
theoretic set-up with 
strategic interaction 
among regions 


MAGICC 6.0 


Ref. 17 
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Figure 2 | Comparison of the 
model results 2010-2050. a, Global 
natural gas consumption. The 
relatively small difference in gas 
production in WITCH (11%) 
becomes considerably greater in the 
second half of the century, beyond 
the time scope of this study. b, CO2 
emissions from fossil fuels. c, Total 
radiative forcing. d, Global mean 
surface temperature change (from 
pre-industrial average 1750-1849). 
The literature range is obtained from 
refs 29 and 30. 


Figure 3 | Global energy 
consumption and radiative forcing 
in 2050. a, Differences in energy use 
by sector and fuel (the Abundant Gas 
scenario minus the Conventional 
Gas scenario) in 2050. One 

avenue for possible change in the 
transportation sector is through the 
use of gas in transportation fuel 
production. The MESSAGE model 
reports this effect at a noticeable scale 
(10%). b, Year 2010 and year 2050 
composition of radiative forcing for 
the Conventional Gas scenario for 
five models. c, Year 2050 relative 
difference in radiative forcing 

(the Abundant Gas scenario minus 
the Conventional Gas scenario) 

for the five models. 1% difference 

in forcing for model average is 


equivalent to 0.042 Wm *. 


changes in emissions or climate forcing. The result stems from three 
effects: abundant gas substituting for all energy sources; lower energy 
prices increasing the scale of the energy system; and changes in non- 
CO, emissions. This result is potentially sensitive to a range of model 
assumptions. 

One important assumption is that market forces are allowed to work 
themselves out largely unfettered. Our results would be different if pol- 
icies that limit natural gas’s ability to substitute for low-carbon energy 
were implemented on a global scale. To explore this sensitivity we recal- 
culated the emissions assuming that abundant gas substitutes exclusively 
for coal. This assumption is analogous to a global clean energy standard 
where the capacities of carbon-free energy sources are exogenously 
specified. With the exception of BAEGEM, the models reported CO, 
emission reductions between —0.1% and —6% (Extended Data Table 2). 
BAEGEM’s result of a 7% increase was driven by an overall energy expan- 
sion of 11%. 

The results are also influenced by assumptions about technological 
change in other domains. Although the results reported here assumed 
changes to gas supply technology alone, oil production is experiencing 
similar technological advances. Extending the analysis to oil as well as 
gas production would not be expected to lower future CO, emissions 
or climate forcing because the carbon-to-energy ratio for oil is approxi- 
mately 35% higher than that of natural gas. 

Fugitive methane emissions associated with natural gas production, 
transmission, and distribution is another important factor. On the one 
hand, conventional estimates for natural gas methane leakage rates have 
been less than 2% of production”*™, and studies have shown that the 
leakage rate is not considerably different between conventional and uncon- 
ventional sources**”*. On the other hand, other studies have reported 
substantially higher leakage rates’”””*. To test the sensitivity of our results 
to these assumptions, we chose the highest value (7.9%; ref. 7) from a 
range of methane leakage rates found in the literature. We then recal- 
culated climate forcing and found that the effect of abundant gas is to 
increase climate forcing by 0.2% to 12% in 2050, which is 0.5% to 5% 
higher than in our central scenario (Extended Data Fig. 1). In other words, 
the finding that abundant gas does not discernibly reduce climate forcing 
is consistently reported over a wide range of fugitive methane rates found 
in the literature. Furthermore, under high fugitive emission assumption, 
three models reported increased climate forcing of more than 5%. 

This analysis focused solely on the potential of abundant gas to affect 
greenhouse gas emissions in the absence of greenhouse gas mitigation 
policies beyond those already in effect. The interaction between abun- 
dant natural gas and greenhouse gas mitigations policies is another issue 
in need of further examination””®. Finally, we note that the global deploy- 
ment of improved natural gas extraction technology carries implications 
not only for climate change, but also for many other important concerns 
including air and water quality, energy security, access to modern energy, 
and economic growth’”*. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Gas supply curves. Natural gas supply curves are harmonized across all the 
models. The Conventional Gas scenario represents an estimate of economically 
recoverable gas consistent with technology available before the shale gas revolu- 
tion. This natural gas supply curve is constructed based on the synthesis of natural 
gas supply and geographic distribution in the GEA report’*. GEA’s 2010 natural 
gas supply curve is truncated at 11,000 EJ of cumulative global supply to represent 
limited conventional gas supply. No future cost reduction from technological change 
is assumed. This curve represents a lower bound of global natural gas supply (see 
green curve in Extended Data Fig. 2) 

The Abundant Gas scenario represents an upper bound of global natural gas supply. 
This supply curve is characterized by both the global abundance of the resources 
and the substantially lower extraction costs. Global abundance is implemented by 
allowing both conventional and unconventional gas in GEA estimates’’ to be fully 
available for extraction. A cumulative supply of 39,000 EJ is assumed to be available 
globally. 

Lower extraction cost is implemented by future technological changes reducing 
extraction cost over time. The extraction costs of early adopters (USA and Canada) 
are assumed to reduce exponentially by 1.7% per year over the course of 2011-2050. 
Extraction costs from all other regions are assumed to reduce exponentially by 2.0% 
per year over the course of 2016-2050. In all regions, the cost of extraction is reduced 
to half by 2050 from the GEA’s 2010 estimate. 

This is on the aggressive side of the cost reduction estimates found in the liter- 
ature. For instance, the International Energy Agency (IEA)’s ‘Golden Age of Gas’ 
scenario’ shows a 23% reduction by 2035; Newell and Raimi’ assumes a 45% reduc- 
tion by 2040; the EMF 26 model comparison exercise’s “High Shale’ scenario’° 
assumes a 21% reduction by 2035; and finally, the original GEA scenarios’* assumed. 
a 33% reduction by 2050. Although direct comparison is difficult owing to each sce- 
nario’s differences in design, measurement, and time frame, our aggressive assump- 
tion is intended to represent a lower bound for future gas production costs. 

The cost reduction assumption is also more aggressive than that of relatively 
mature low-carbon energy sources such as nuclear power plants, but not neces- 
sarily more aggressive compared to that of immature technologies such as solar 
photovoltaics (see Extended Data Table 1 for cost reduction comparison with other 
energy sources). 

When the abundance in quantity and reduction in production cost are com- 
bined, the Abundant Gas supply curve allows 31,000 EJ of cumulative natural gas 
production at $3 per GJ or less by 2050. Extended Data Fig. 2 shows the reduction 
in cost from 2010 (red curve) to 2050 (blue curve). These production costs are dif- 
ferent from the actual prices in the market place. The costs do not include taxes or 
royalties, nor do they include external environmental or social costs associated 
with gas production’’. 

GCAM and MESSAGE models further tested sensitivity to the magnitude of 
production cost reductions. GCAM is a model relatively less sensitive to Abundant 
Gas supply, while MESSAGE is a model relatively more sensitive to it. The two 
models projected a total of five natural gas supply scenarios: (1) the Conventional 
Gas scenario; (2) the Abundant Gas scenario (at the standard 50% cost reduction); 
(3) the Abundant Gas scenario (with the high cost reduction of 75%); (4) the Abun- 
dant Gas scenario (with the low cost reduction of 25%); (5) the Abundant Gas sce- 
nario (with the zero cost reduction, abundant in quantity only). 

The results are shown in Extended Data Fig. 3. Collectively, these scenarios cover 
a wide range of cost reduction found in the literature. Our core finding from the 
main analysis is found to be consistent with results from this sensitivity. In all cases 
considered, we found that more abundant natural gas could substantially change 
the global energy system over the decades ahead without producing commensurate 
changes in emissions or climate forcing. 

GCAM reported +13% to +82% additional natural gas consumption in 2050, 
while the change in CO, emissions is found to be in the —0.9% to —2.0% range and the 
change in radiative forcing is found to be in the +0.3% to + 1.1% range. MESSAGE 
reported +56% to +170% additional natural gas consumption in 2050, while its 
change in CO} emissions is found to be in the — 1.0% to +0.6% range and its change 
in radiative forcing is found to be in the +0.7% to +3.4% range. 

Just as in the main analysis, the models did not agree on the direction of the 
impact on CO, emissions. GCAM consistently reported lower CO) emissions with 
respect to lower cost assumptions. MESSAGE reported that the CO, emissions 
increase at the high end of the cost reduction range. However, these changes are 
very small, with a magnitude less than 2% of the total emissions. Once we consider 
the combined effect of all greenhouse gases, the two models consistently agree on 
the direction of the change: the lower the natural gas production cost, the higher the 
total radiative forcing and associated temperature change. Our main finding that 
increased use of abundant gas does not produce a discernible reduction effect on cli- 
mate forcing is found to be consistent across the range of cost reduction sensitivities. 


Main analysis. The main analysis presented in the paper may be sensitive to a 
range of model assumptions. We reported two of the core sensitivities in the main 
text. Here we describe detailed methodologies for the main analysis and the sen- 
sitivity analyses. 

The main analysis follows a standard method for IAM study on baseline scenarios. 
The five models are explicitly designed to project the future emissions trajectory 
under various assumptions about the energy system and the economy. Represent- 
ing the energy and economic system in an abstract structure, IAMs provide a sim- 
ulation method of conducting an analysis when a ‘controlled experiment’ in the 
strict sense is not possible. Similar to a controlled experiment, our numerical exper- 
iment keeps all other parameters constant and varies only the natural gas supply 
curve. We then simulate the effect of market forces on the energy system evolution 
through 2050. From the two simulations that differ only in terms of natural gas 
supply, we report the differences in the output variables, such as energy system com- 
position, emissions, and climate forcing. Such differences in output variables are 
directly attributable to the differences in input variables. 

To closely replicate the human system dynamics, each model calibrates its para- 
meters to the observed data in the historical years. The data used for calibration is 
reported in the model descriptions. Calibrated parameters include technological 
parameters such as energy production efficiency and emissions intensity, economic 
parameters such as price elasticities and income elasticities, and non-market para- 
meters such as regional preferences for specific fuel type or preferences for a specific 
mode of travel. Projecting into the future, some parameters are assumed to improve 
over time (for example, energy production efficiency) and others are assumed to be 
constant (for example, social preferences for a specific mode of travel are assumed 
to be constant). 

As this is a baseline scenarios study, we did not assume any explicit climate change 
mitigation policies. This study addresses the following question: if there are no new 
policies to mitigate climate change, does increased use of abundant natural gas reduce 
greenhouse gas emissions? However, although no economy-wide climate change 
mitigation policy has been currently implemented, the currently existing policies 
that have been implemented in the past would affect the parameters calibrated to 
historical observations. For instance, the Corporate Average Fuel Economy (CAFE) 
standard in the USA has been enforced since 1978 to increase the fuel economy of 
cars and decrease fuel consumption. Although this was not explicitly intended as a 
climate change mitigation policy, it has had the side-effect of reduced emissions 
per distance travelled. This side-effect would be implied in the calibration process 
and propagate forward into the future. As a result, the future projection of emis- 
sions would be lower than they would otherwise have been without the CAFE 
standard in effect. 

Similarly, any energy policies that were enforced before the calibration periods 

would affect the calibration parameters. These include Renewable Fuel Standard 
policies that mandate the ratio of biofuels in gasoline, building energy standards 
that mandate the minimum efficiency levels of building shells, and renewable or 
fossil fuel energy subsidies. These policies affect the implied preference for certain 
energy sources or efficient equipment. However, we do not include the policies that 
are proposed, but not currently in effect. For studies that do include proposed poli- 
cies in their scenarios, see refs 31 and 32. Next, we describe one sensitivity analysis 
that explicitly represents a future energy policy that is currently not included in the 
calibration. 
Abundant gas exclusively substitutes coal scenario. In our main analysis models 
allow natural gas to substitute not only for coal, but also for a range of energy sources 
including solar, wind, nuclear, and bioenergy. These substitutions are driven by the 
economic competitiveness of each fuel type. However, it is also possible to imagine a 
policy architecture in which a normative policy protects low-carbon energy sources, 
thus effectively forcing additional natural gas to exclusively substitute coal. Under 
such restrictions, we expect overall CO, emissions to decrease. To estimate the mag- 
nitude of the sensitivity to the substitution restriction, we assume a future where 
low-carbon energy sources are protected by a globally enforced Clean Energy Stan- 
dard. In this scenario, natural gas is assumed to exclusively substitute for coal. 

First, we assume that the low-carbon energy quantity is fixed at the same level as 
the Conventional Gas scenario. Then, we calculate the quantity difference in low- 
carbon energy between the Conventional Gas and Abundant Gas scenarios; this is 
the amount of low-carbon energy that would be protected under the policy. To keep 
the scale of energy system unchanged, we assume the same amount of coal is instead 
substituted by additional gas. The total amount of gas consumption remains unchanged. 
We then apply the emissions factors from Extended Data Table 3 and recalculate 
the additional emission reduction. 

With the exception of BAEGEM, all models show that the ‘coal substitution only’ 
assumption results in emission reductions in 2050, ranging from —0.1% (WITCH) 
to —5.9% (MESSAGE). See Extended Data Table 2 for the range of values. Com- 
pared to the emissions changes in the main analysis, in which the majority of the 
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models showed positive emissions increase, the ‘coal substitution only’ scenario shows 
that under certain policy conditions, abundant gas could help reduce CO, emissions. 

In the case of BAEGEM, the lower gas prices accelerate economic activity, such 

that the overall energy system is 11% larger in the Abundant Gas scenario. The 
‘coal substitution only’ assumption does reduce the average emission intensity of 
the energy system, but the energy system expansion effect still dominates, such that 
the total emission is still larger than in the Conventional Gas scenario. 
High fugitive methane emissions scenario. The fugitive methane emission rate is 
subject to large uncertainty. The rates used in the five models all fall within the 
range 0.3-0.6 kg of CH, per GJ (Extended Data Table 3). These values are similar 
to the values reported in conventional literature’****’. However, some recent liter- 
ature suggests that the fugitive methane rate may be substantially higher by up toa 
factor of four’”””*. To test our results’ sensitivity to high fugitive methane rates, we 
select the upper bound of fugitive methane estimates (7.9%) found in the literature” 
and re-estimate the climate forcing. 

We start from the original emission trajectories from each model. Then, while 

keeping all else equal, we recalculate the methane emission trajectory by applying 
the high fugitive methane emission rate to the natural gas use. These modified emis- 
sion trajectories are then reprocessed through the common climate model MAGICC6. 
With the high fugitive methane assumptions, the abundant gas increases the total 
anthropogenic radiative forcing by 0.2% to 12% in 2050, which is 0.5% to 5% points 
higher than under the standard assumptions. A full comparison is shown in Extended 
Data Table 4. 
IAMs of energy-economy-climate systems. The five models that are used in this 
study are members of a class of models referred to IAMs. IAMs in general encom- 
pass the broad suite of human and natural Earth systems including the economy, 
energy, agriculture, land-use, land cover, and biogeophysical processes from car- 
bon and hydrologic cycles, the atmosphere, oceans and climate'’. The five models 
employed in this study are well equipped to assess the impact of abundant natural 
gas on climate forcing. Each contains a state-of-the-art energy-economy systems 
model coupled to a simple climate model. 

Each of the five models represents energy and economic systems differently. 
Below, we provide a general description of the strengths and limitations the five 
models bring to the issue of assessing the impact of abundant natural gas on cli- 
mate forcing. This is followed by more detailed descriptions of the five models. The 
IAMs of energy-economy-climate systems employed in this study bring a number 
of strengths to the issue of the global long-term climate forcing implications of abun- 
dant natural gas. In general, the models were designed to address precisely the kind 
of problem we explore in this paper. They have the appropriate geographic, tem- 
poral, and sectoral coverage. 

All of the models explicitly represent processes that start from the extraction of 
primary energy (exhaustible fossil fuels and renewable energy) to energy transfor- 
mation (for example, liquid fuel refineries and power generation) to end-use services 
(buildings, industry, and transport). The models feature explicit representation of 
energy markets with price-responsive demand and supply for coal, oil, and gas as 
well as low-carbon energy sources’. The flexibility and interdependence of energy 
markets are crucial for the present study because these features determine the degree 
to which additional natural gas is consumed and by how much this reduces the 
demand for other fuels. All five models employ a standard economic paradigm in 
their representation of energy markets. Price is the principal force determining and 
equilibrating the supply of and demand for different fuels. 

IAMs vary in a number of important ways. While all of the models in this study 
have explicit representations of both the economic system and the energy system, 
they vary in terms of their relative emphasis on representing the details of the two 
interlinked systems. Model structures that emphasize economic interactions across 
all sectors of the industry are particularly strong for examining how changes in one 
industry propagate through the whole economy. These models are also strong in 
examining changes in international trade patterns due to region-specific changes in 
industrial structure. BAEGEM" is one example of this type of model with 25 explicit 
sectors of the economy each consuming a bundle of energy sources, where the share 
of the bundle is determined by the relative prices. 

In contrast, GCAM™ and MESSAGE” place greater emphasis on representing 
the details of the physical energy system. They contain detailed representations of 
key energy systems and technology options for producing, transforming and using 
energy, while adopting more aggregate representations of the broader economy. 
GCAM and MESSAGE have more than 100 different energy supply and conversion 
technology representations. This approach is more rigid in the ability to substitute 
between the factors of production, namely capital/labour inputs and energy inputs, 
compared to the approach used by models such as BAEGEM. However, it can better 
capture the physical details of individual services provided in the end-use sectors, 
such as ton-kilometres of freight or GJ of residential heating. This modelling approach 
is particularly strong for in-depth analysis of a specific energy technology and tracking 
the physical flows of energy goods and services. 
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REMIND” and WITCH” lie between BAEGEM on the one end and GCAM 
and MESSAGE on the other. These models have more detail in their economic 
structure and less energy system detail than the latter, but more energy system detail 
and less economic system detail than the former. 

Another domain in which the models vary is their assumption about the know- 
ledge and behaviour of their economic agents. Intertemporal optimization models 
represent economic agents that maximize their economic utility over the model 
time horizon. The agents are assumed to know future price changes with certainty 
and hence the resulting model solution is economically optimal for each agent. 
MESSAGE, REMIND, and WITCH employ this method. 

On the other hand, GCAM and BAEGEM takes a more descriptive approach 

where the economic agents are assumed to not know the future price changes, but 
rather make production and consumption decisions based only on the informa- 
tion available to them at any given time. This approach provides insights about 
how the system might be expected to evolve under imperfect information about 
the future. However, the resulting equilibrium trajectory may not be economically 
optimal. 
Modelling paradigm for IAMs. The variety in modelling approaches used by the 
five models in this exercise strengthens this analysis. This diversity guards against a 
result that is the product of an individual model’s idiosyncratic behaviour. To the 
extent that models employ different methodologies and get qualitatively similar 
results, we have greater confidence in the result. That said, all models subscribe to 
the standard economics paradigm. Other modelling paradigms exist, such as adap- 
tive agent models, system dynamics models, or infrastructure models. We have not 
tested these modelling paradigms in this paper. As such, the models do not span 
the full range of all possible methods that could potentially be employed to assess 
the impact of more abundant natural gas for climate forcing. 

One limitation of the models employed in this study is that they do not model 
explicit locations of physical infrastructure. The geographic locations of present 
and future natural gas pipelines and liquefied natural gas terminals are modelled at 
coarse international resolution, and do not take into account detailed local infor- 
mation that shapes decisions about which facilities are deployed, and where and 
how they are connected to the broader system. Geographically resolved infrastruc- 
ture models can potentially include this level of detail. However, infrastructure 
models are relatively static in nature and are therefore generally not employed to 
model the global energy system’s evolution over multiple decades into the future. 

Also, the five models are built on the foundation of the standard economic para- 

digm, and they do not, for instance, employ an adaptive agent modelling approach 
or systems dynamics approach. Future research could employ a broader suite of 
modelling methods to shed further light on the implications of abundant natural 
gas for climate forcing, and examine whether other modelling approaches would 
yield a qualitatively different result. 
Representing the policy environment in IAMs. The energy policy environment 
exerts a strong influence on energy production and use and thereby on climate forc- 
ing. Our default assumption is that no new policies and measures are introduced 
after the calibration period. Alternative assumptions can produce different results 
for energy, for the economy and for climate forcing. We tested one energy policy 
that can potentially change the results. We found that exogenously specifying the 
quantity of low-carbon energy sources and forcing natural gas to substitute exclu- 
sively for coal results in emissions being reduced in the models. 

Other policies, such as carbon tax, cap-and-trade, or natural-gas vehicle subsidies, 

could alter our results. The climate implications of abundant gas under climate 
policies are of great interest. However, the issue is sufficient in scope and depth to 
require its own future research, and hence is not addressed here. 
Baseline assumptions in IAMs. Finally, we point out that the numerical simula- 
tions of the effects of increased natural gas availability that we have performed for 
this paper are all based on each model’s native reference scenario assumptions. Those 
scenarios are each developed by the modelling teams themselves and no attempt 
was made to harmonize assumptions other than natural gas supply curves. This 
was intentionally done to increase the variety of conditions against which the impli- 
cations of abundant gas would be assessed. 

Specifying different exogenous assumptions would produce different results. Some 
perturbations have well established consequences common to all of the models. 
For example, higher population growth or higher rates of economic productivity 
growth increases the scale of the energy system overall. Perturbations in the assumed 
rate at which technological change that occurs in low-carbon technologies would 
change the future emission intensity of the energy system. 

While we have not attempted to explore the sensitivity of our results to variation 
in those assumptions, the five models’ native assumptions cover a wide uncertainty 
range consistent with the large majority of the literature’’, but they do not cover 
the extremes found in the literature (see Extended Data Fig. 4). Examining the four 
principal components of model projections (population, gross domestic product 
(GDP), energy consumption, and CO; emissions), we observe that the five models 
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cover most of the 10th to 90th percentile range for population and GDP in 2050. 
The projected ranges for energy consumptions and CO) emissions are narrower, 
but cover most of the 25th to 75th percentile range. To the extent that the five 
models report qualitatively similar results despite the large variations in baseline 
assumptions, we have greater confidence in the results. 

Uncertainties in IAMS. Uncertainty attends every element of the modelling pro- 
cess. There are two major elements of uncertainty in the modelling system: model 
structure and model input assumptions (future population, economic activity, tech- 
nology and policy). Methods have been developed to address each of these uncer- 
tain elements. Uncertainty in model input assumptions are addressed in a variety of 
ways ranging from simple sensitivity analysis to formal uncertainty quantification 
analysis. Uncertainty in model structure is more difficult and model intercompar- 
ison is an important tool for exploring this source of uncertainty. 

The formal Monte Carlo analysis***” employs a single model to process numer- 
ous samples of input variables, such as fossil fuel supply, economic growth, tech- 
nological learning, and so on. The distribution of the output shows the range of 
uncertainty associated with the model results. For simple models, uncertainties asso- 
ciated with all model inputs can be examined. However, with the growing complexity 
of IAMs, this method is becoming increasingly difficult to implement. 

Sensitivity analyses identify key variables of interest and examine each model’s 
response to variation in those input values. We have identified three such variables 
and performed sensitivity analyses. We have also examined sensitivity to energy 
policy, specifically sensitivity to low-carbon energy policy. In this latter case we found 
that under a global low-carbon energy protection policy, the availability of more 
abundant gas can reduce climate forcing. We also explored the sensitivity of our 
results to the rates of fugitive methane emissions and found that with high fugitive 
methane emission rates, more abundant gas can discernibly increase climate forc- 
ing. Finally, we explored the sensitivity of model outputs to natural gas supply char- 
acterizations, specifically production cost, and found that our results were consistent 
across a wide range of natural gas supply assumptions. 

Model intercomparison projects (MIPs), where a number of models simulate a 
common set of scenarios, are the primary method employed to explore the impli- 
cations of variation in model structure, although they are also used to structure 
sensitivity and scenario analysis. The Energy Modelling Forum (EMF)’° has been 
conducting MIPs of energy-economy models since 1977. The MIPs conducted by 
EMF are analogous to the Coupled Model Intercomparison Project (CMIP)* of 
the climate modelling community, where a larger number of climate models project 
the future climate and assess the distribution of the projection. MIPs can be thought 
of as the modelling equivalent of scientific hypothesis testing using different methods. 
If a number of models with heterogeneous architecture reach a common conclu- 
sion, we can have greater confidence in that conclusion. 

This study is an example of a MIP with a small number of models. The com- 

parison of the results across the models shows large uncertainty. The uncertainty is 
especially large in the future level of natural gas consumption, and consequently 
the use of competing energies, such as solar photovoltaics and nuclear power plants. 
The uncertainty is also present in the size of the impact of abundant gas on the 
emissions. These results are highly dependent on model architecture and the implied 
flexibility of fuel-switching. However, the models all agree on the most potent con- 
clusion: increased supply of abundant gas does not discernibly reduce either CO2 
emissions or climate forcing. Some models report a discernible increase in emis- 
sions or climate forcing, and others report negligible change. But none of the models 
report more than a 2% reduction in emissions or climate forcing. This qualitative 
agreement across five heterogeneous models in this exercise gives greater confi- 
dence in the conclusion. 
Overview of the BAEGEM model. BAEGEM” is a recursively dynamic com- 
putable general equilibrium model of the world economy. For each one-year time 
step, BAEGEM simulates the interrelationships between economic growth, flows 
of international trade and investments, constraints on natural resources and pro- 
duction factors, greenhouse gas emissions and climate change policies. 

The central core of BAEGEM is built on the familiar approaches of the GTAP 
model’, with the household consumption behaviour and the producer behaviour 
represented separately by a constant difference of elasticities function and a nest- 
ing of Leontief, constant elasticity of substitution (CES) and constant ratios of 
elasticity of substitution, homothetic (CRESH) functions”. 

BAEGEM is written in GEMPACK™. The full model code is complemented by 
four interlinked modules: (1) the government module; (2) the technology mix module; 
(3) the energy module; and (4) the greenhouse gas emissions module. The model is 
ideally suited to analysing domestic and international energy-related policies, and 
the impacts of economic shocks. 

The BAEGEM database is derived from a number of sources. The global social 
accounting matrix is derived from the GTAP version 8 database“ with a base year 
of 2007. To enhance the capability of modelling individual commodities, the number 
of commodity groups in BAEGEM has been expanded from 57 in the GTAP version 


8 database to 72. The disaggregated commodities include black thermal coal, brown 
coal, coking coal, iron ore, bauxite, copper ore, gold, uranium, titanium, zirconium, 
coke, nuclear fuel, alumina, copper, aluminium and liquefied natural gas. 

The emissions database covers all Kyoto gases and is sourced from the Inter- 
national Energy Agency**“*, the United National Framework Convention on Climate 
Change” and the US Environmental Protection Agency**. The data in the govern- 
ment module are sourced from Global Insight while the data in the technology mix 
and energy modules are sourced from the IEA. 

The global temperature rise, total radiative forcing and the atmospheric concen- 
tration of carbon dioxide can be calculated from the BAEGEM results by linking to 
MAGICC”, with climate sensitivity set to three degrees Celsius. BAEGEM natively 
links to MAGICC 5.3, but for this study we have used MAGICC 6.0 for latest sci- 
entific knowledge and consistency across models. 

Supplementary Fig. 1 and Supplementary Table 1 provide an overview of the other 

key features of BAEGEM. 
Modelling energy commodities in BAEGEM. The energy module tracks the 
production of primary and secondary energy, and the consumption of final energy 
by government, households and firms. Changes in production volume over the pro- 
jection period are driven by global demand growth, which in turn is determined by 
real GDP growth, and changes in prices, consumption preference, market structure, 
sector productivity and market structure. 

The government demand for each commodity is derived from a Cobb-Douglas 
function nested with Armington composites of commodities supplied by domestic 
and foreign sources. The household demand for each commodity is determined by 
the demand of a representative household and the growth in population. At the 
first level, the representative household chooses quantities of non-energy commod- 
ities and an energy composite (that is, coal, gas, refined petroleum product, elec- 
tricity and heat) to maximize a utility function, given a budget constraint. At the 
next level, the representative household chooses quantities of energy commodities 
to minimize the cost of consuming the energy composite in the previous level. The 
purpose of this two-level demand system is to reflect better the substitutability between 
energy commodities with a more flexible substitution system. 

Demands for energy commodities in each production sector are derived from a 
nesting of Leontief, CES and CRESH functions. At the first level, a Leontief tech- 
nology links the input of factor-energy composite to the industry output level. At 
the second level, it is a CES cost minimization problem searching for an optimal 
combination of energy and factor composites where energy commodities and prim- 
ary factors (that is, capital, labour, land and natural resources) are substitutable, but 
not perfectly so. For land and natural resource-intensive industries (that is, crops, 
livestock, coal, oil, and gas), a CES structure with imperfect substitutability ensures 
that constraints on land and natural resource or more intensive use of capital and 
labour under finite natural resources can be modelled properly in BAEGEM. At the 
third level, another cost minimization problem is specified, but here it searches for 
an optimal combination of energy commodities under a CRESH production function. 

Electricity supply from various technologies is modelled inside the technology 
mix module. The ‘technology bundle’ approach ensures that electricity output can 
be produced from a bundle of individually identified generation technologies and 
that each technology uses a different mix of inputs. The purpose of integrating a 
bottom-up modelling approach for the electricity sector into BAEGEM is to rep- 
resent better the technology-specific detail of the sector while retaining the benefits 
of the top-down interactions modelled in BAEGEM. In this application, the elec- 
tricity output is the sum of nine technologies: coal; oil; gas; nuclear; hydro; wind; 
solar; biomass; and others. 

The substitution possibilities between electricity technologies in BAEGEM are 
governed by a CRESH aggregation function. CRESH is a generalization of CES and 
allows elasticity of substitutions to vary between its elements. In other words, certain 
technologies identified in the framework can be assumed to be more substitutable 
than others. The use of the family of CRESH aggregation functions allows for the 
fact that electricity, which is a homogenous output, can be generated in an eco- 
nomy simultaneously from different technologies with different production costs. 
Modelling greenhouse gas emissions in BAEGEM. The greenhouse gas module 
tracks Kyoto gas emissions (for CO2, CH4, N,O, HFCs, PFCs and SF.) over the 
course of production, transformation, consumption, and combustion. For each time 
step, emissions pathways of Kyoto gases are derived from the quantities of these 
economic activities and changes in emission factors. The projections of radiative 
forcing agents other than Kyoto gases are selected from emission scenarios in MAGICC 
according to modelling criteria, assumptions and applications. Supplementary Table 2 
provides the list of the gases and the data sources for assigning emissions coeffi- 
cients for each sector in BAEGEM. 

BAEGEM assumes the constant proportionality of emissions with respect to the 
quantity of fossil fuel combusted over time. The disaggregated CO, emissions for 
the base year is derived from the GTAP 8.0 database with adjustments to ensure 
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that aggregate combustion emissions at country level are consistent with the IEA 
combustion emission database”. 

Non-combustion emissions, such as fugitive emissions from fossil fuel mining, 

enteric fermentation in livestock production and chemical transformation in man- 
ufacturing processes, are assumed to change in proportion to their production levels 
adjusted by EMF21 marginal abatement curves”’. The use of marginal abatement 
curves in the module allows a gradual reduction of non-combustion emissions per 
unit of output with additional reduction opportunities when carbon price increases. 
The disaggregated non-CO, emissions for the base year is derived from the US 
Environmental Protection Agency database“ and the GTAP 7.0 database with adjust- 
ments to ensure that aggregated non-CO, emissions are consistent with the IEA non- 
CO), emissions database**. 
Overview of the GCAM model. GCAM is a global integrated assessment model 
of energy, economy, land-use, and climate. GCAM originates from the Edmonds 
and Reilly model’*-**. In this paper, we use the standard release of GCAM 3.1 with 
the natural gas system specifically modified to reflect the common assumptions on 
natural gas. GCAM is an open-source model" primarily developed and maintained 
at the Joint Global Change Research Institute. The full documentation of the model 
is available at the GCAM wiki’, and the following description is a summary of the 
wiki documentation. 

GCAM is a long-term global model with particular emphasis on the represen- 
tation of human dimensions of the Earth system. GCAM integrates representations 
of the global economy, energy systems, agriculture and land use, with representa- 
tion of terrestrial and ocean carbon cycles, and a suite of coupled gas-cycle and 
climate models (Supplementary Fig. 2). 

The climate and physical atmosphere in GCAM is represented by MAGICC”. 
The emission trajectories of greenhouse gases are modelled in GCAM’s energy and 
land-use components. GCAM is natively integrated with MAGICC 5.3, but for this 
study we have used MAGICC6.0 for the latest scientific knowledge and consis- 
tency across models. 

The global economy of GCAM is represented in 14 geopolitical regions, expli- 

citly linked through international trade in energy commodities, agricultural and 
forest products, and other goods such as emissions permits. The scale of economic 
activity is driven by population size, age and gender, and labour productivity, which 
determine economic output in each region. The energy and land-use market equi- 
librium is established in each period by solving for a set of market-clearing prices for 
all energy and agricultural good markets. This equilibrium is dynamic-recursively 
solved for every five years in the period 2005-2100. Supplementary Table 3 provides 
an overview of the other key features of GCAM. 
Modelling energy system and natural gas in GCAM. In GCAM, the energy system 
represents processes of energy resource extraction, transformation, and delivery, 
ultimately producing services demanded by end users. Resources are classified as 
either depletable or renewable; in either case, the extraction costs of a given resource 
are assumed to increase as economically attractive resources are employed, but are 
also subject to technological progress which can lower extraction costs for a given 
resource grade. In each time period, the market prices of energy goods and services, 
including fossil fuel resources, are determined within the market equilibrium. 

Fossil fuel energy is produced from a graded, regionally disaggregated deple- 
table resource base. Renewable energy forms are also disaggregated by region and 
resource grade; however, by their nature, the resource is not consumed by use. Primary 
energy forms can be transformed into final energy products, including electricity, 
processed gas products, refined liquids, and so on. 

Energy transformation sectors convert resources initially into fuels consumed 
by other energy transformation sectors, and ultimately into goods and services con- 
sumed by end users. Multiple technologies compete for market share; shares are 
allocated among competing technologies using a logit choice formulation”. The 
cost of a technology in any period depends on two key exogenous input parameters— 
the non-energy cost and the efficiency of energy transformation—as well as the prices 
of the fuels it consumes. The non-energy cost represents all fixed and variable costs 
incurred over the lifetime of the equipment (except for fuel costs), expressed per unit 
of output. For example, a gas-fired electricity plant incurs a range of costs associated 
with construction (a capital cost) and annual operations and maintenance. The 
efficiency of a technology determines the amount of fuel required to produce each 
unit of output. The prices of fuels are calculated endogenously in each time period 
based on supplies, demands, and resource depletion. The depletion of economically 
available energy resources is explicitly tracked throughout the modelling period. 

The natural gas resource supply curves for the two scenarios are based on syn- 
thesis by the GEA”, as described above. In GCAM, natural gas can be used for direct 
combustion in the end-use sectors or converted into other energy forms, such as 
electricity hydrogen or refined liquids, before being consumed in the end-use sectors. 
Direct combustion and conversion to other forms both result in CO, emissions. The 
physical quantity of carbon is preserved throughout the energy system process. Once 
natural gas is extracted, the carbon in the fuel is either emitted or sequestered. Non-CO, 
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emissions are tracked separately. The next subsection describes the treatment of 
non-CO, emissions in detail. 

Modelling greenhouse gas emissions in GCAM. GCAM tracks 16 different green- 
house gases, aerosols and short-lived species. Supplementary Table 4 provides the 
list of the gases and the data sources for assigning emission coefficients for each 
sector in GCAM. 

Fossil fuel CO, emissions are modelled according to the following method: (1) 
The total emission in the base year is calibrated to the Carbon Dioxide Information 
Analysis Center database**. (2) The fossil fuel consumption in the base year is cali- 
brated to the IEA’s Energy Balances Database”. (3) The average emission coeffi- 
cients are derived from the ratio of the total emission and the total fuel consumption 
for each fuel (coal, oil, and gas). (4) These emission coefficients are applied to each 
sector in the base year. (5) For future periods, GCAM solves for market shares of 
each fuel in each sector, and the emissions are calculated to be the product of emis- 
sion coefficients and the fuel consumption in each sector. 

Non-CO), gases in the energy system are calculated according to the following 
method: (1) The total emission for each gas in the base year is calibrated to the RCP 
data'**", (2) Emissions by each sector in GCAM are compiled from the databases 
listed in Supplementary Table 4. (3) The individual emission coefficients for the 
base year are calculated by scaling the individual sector emissions to match their 
sum to the total emissions. (4) For future periods, GCAM solves for market shares 
of each technology in each sector, and the emissions are calculated by the product 
of emission coefficients and the technology usage level in each sector. (5) Future 
emission coefficients are assumed to improve over time with economic growth based 
on Energy Modelling Forum Study 21***'. 

Extended Data Table 3 shows the calculated emissions coefficients of CO and 
CH, emissions for each fossil fuel. Fugitive CH, emission for natural gas is modi- 
fied in the sensitivity analysis to reflect the wide range of estimates in the literature. 
Overview of the MESSAGE model. MESSAGE***® is a linear-programming sys- 
tems engineering optimization model used for medium- to long-term energy system 
planning and policy analysis. The model minimizes total discounted energy system 
costs, and provides information on the utilization of domestic resources, energy 
imports and exports and trade-related monetary flows, investment requirements, 
the types of production or conversion technologies selected (technology substitu- 
tion), pollutant and greenhouse gas emissions, and inter-fuel substitution processes, 
as well as temporal trajectories for primary, secondary, final, and useful energy. 

MESSAGE stands at the core of the IIASA integrated assessment framework", 
which combines a blend of different models to represent the global economy and 
the interactions between energy, agriculture, and forest sectors and their implica- 
tions for greenhouse gas emissions and associated climate responses. 

MESSAGE is linked to the macro-economic model MACRO for assessing eco- 
nomic feedbacks and price-induced changes of energy demand™. In the form used 
here, MACRO has its roots in a long series of models by Manne and Richels, the 
latest of which is MERGE 5.1. MACRO’s objective function is the total discounted 
utility ofa single representative producer—-consumer (for each of its 11 world regions). 
The maximization of this utility function determines a sequence of optimal savings, 
investment, and consumption decisions. In turn, savings and investment determine 
the capital stock. The capital stock, available labour, and energy inputs determine 
the total output of an economy according to a nested CES production function. 
Energy demand in six categories (industry electric and thermal, residential electric 
and thermal, transport and non- energy use) is determined within the model, con- 
sistent with the development of energy prices and the energy intensity of GDP. 
When MACRO is linked to MESSAGE, internally consistent projections of GDP 
and energy demand are calculated in an iterative fashion that takes price-induced 
changes of demand and GDP into account. This is achieved through iterations 
between the two models, in which demand, energy system costs and energy prices 
are exchanged until the solution of both models converge. For details of the iter- 
ative model linkage, see ref. 62. 

In addition to the energy sector, MESSAGE represents the greenhouse gas emis- 

sions from land-use changes in the agricultural and forest sector. For the calculation 
of physical climate responses, MESSAGE is coupled with MAGICC®. MESSAGE is 
natively integrated with MAGICC 5.3, but for this study we have used MAGICC 6.0 
for the latest scientific knowledge and consistency across models. Supplementary 
Table 5 provides an overview of the other key features of MESSAGE. 
Modelling energy system and natural gas in MESSAGE. A typical model appli- 
cation is constructed by specifying performance characteristics of a set of technol- 
ogies and defining a Reference Energy System that includes energy technologies 
and flows along the entire energy chain. In the course of a model run MESSAGE 
determines how much of the available technologies and resources is actually used 
to satisfy a particular end-use demand, subject to various constraints, while mini- 
mizing total discounted energy system costs. A simplified illustration of the MESSAGE 
Reference Energy System is shown in Supplementary Fig. 3. 
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The representation of the energy system includes explicit tracking of the long- 
lived energy infrastructure by vintage, which allows for consideration of the timing 
of technology diffusion and substitution, the inertia of the system for replacing exist- 
ing facilities with new generation systems, clustering effects (technological inter- 
dependence) and possible phenomena of increasing returns (that is, the more a 
technology is applied the more it improves and widens its market potentials). Com- 
bined, these factors can lead to ‘lock-in’ effects and path dependency (change 
occurs in a persistent direction based on an accumulation of past decisions). As a 
result, technological change can go in multiple directions, but once change is initiated 
in a particular direction, it becomes increasingly difficult to change its course. 

Important inputs for MESSAGE are technology costs and technology perfor- 
mance parameters. For the scenarios included in this paper, technical, economic and 
environmental parameters for over 100 energy technologies are specified explicitly 
in the model. Costs of technologies are assumed to decrease over time as experience 
(measured as a function of cumulative output) is gained. Assumptions for the main 
energy conversion technologies are summarized in ref. 68. The regional energy 
costs are based on IEA®. For carbon capture and storage technologies, the power 
sector applications are based on ref. 70 and the liquid conversion processes are based 
on refs 71-73. Biomass technology costs are based on ref. 71. For the evolution of 
technology costs over time we adopt the assumptions of the GEA-Mix scenario of 
the Global Energy Assessment”. 

Fossil fuel resource estimates and potentials for renewable energy are another 
important set of input parameters. For fossil fuel availability the model distinguishes 
between conventional and unconventional resources for different categories of oil, 
gas, or coal occurrences”. With regard to volumes for coal and oil we mainly follow 
the quantitative assumptions adopted by the GEA”. Resource assumptions for 
natural gas are different across the scenarios in this paper and were specifically updated 
from ref. 12 to represent 14 different occurrences of natural gas for each of the 11 
MESSAGE regions. Energy losses (own use) of natural gas extraction are modelled 
explicitly and range from close to zero up to 25% of the extracted gas, depending on 
the type of natural gas occurrence. Fugitive CH, emissions from natural gas extrac- 
tion are assumed to be between close to zero to 5% of the extracted natural gas and 
increase for unconventional gas resources. Assumptions about energy losses and 
fugitive emissions are based on ref. 77. For renewable energy resource potentials we 
rely on spatially explicit analysis of resource availability and adopt the assumptions 
discussed in ref. 68. 

Representation of natural gas infrastructure in the MESSAGE model comprises 

explicit technologies for extraction, transmission and distribution, trade, conver- 
sion and use of natural gas in appliances of various service sectors. Main energy 
conversion technologies include various types of power generation technologies, 
heat generation (including combined heat and power facilities), hydrogen genera- 
tion, and gas-to-liquid supply chains. Intra-regional trade options include piped gas 
as well as liquefied petroleum gas. Natural gas consumption of end-use appliances 
are modelled at the level of three main energy end-use sectors, including residential 
and commercial, industry and the transport sector. CO emissions are modelled 
along the conversion chain and are either vented to the atmosphere or sequestered 
underground in the case of carbon capture and storage. 
Modelling greenhouse gas emissions and CH, in MESSAGE. In addition to CO 
emissions, the MESSAGE model considers the full basket of non-CO, greenhouse 
gases (CHy, NO, and F-gases) as well as emissions from other radiatively active 
substances from the energy, industrial and non-energy sectors of the economy (dis- 
aggregated at each of the model’s eleven regions). These include particulate matter 
(PM2.5), sulphur dioxide (SO,), nitrogen oxides (NO,), volatile organic compounds 
(VOC), carbon monoxide (CO), black carbon (BC), organic carbon (OC), and 
ammonia (NH3). Representation of non-CO, gases in MESSAGE is described in 
detail in ref. 63. Here, we primarily focus on CHy. 

CH, emissions are calibrated for the base-year to the RCP inventory'®. The model 
represents CH, sources by linking appropriate emission coefficients to various activ- 
ity variables in the model. These include coal, oil and gas extraction and transporta- 
tion; and energy-related fossil fuel and biomass combustion. We assume gradual 
technological improvements in regions with high coefficients for these energy- 
related sources, such as reduced future pipeline leakage in the gas sector in the form 
of decreased emission coefficients. As explained earlier, in the extraction sector emis- 
sions coefficients are different across different natural gas occurrences, and emissions 
can thus increase when shifting from conventional to unconventional occurrences. 

For livestock- and agriculture-related CHy, sector-specific drivers are used to 
project emissions into the future, whereas emissions factors decline for these sources 
over time, consistent with the projected productivity improvements in livestock man- 
agement and agricultural production”®. 

For CH, emissions from solid waste, we use IPCC country-specific mass-balance 
methodology” to obtain estimates of current emissions. We then examine long- 
term trends in waste generation rates, recycling, and gas recovery to develop long- 
term emissions. Based on land availability constraints and current trends in most 


developed countries, the rates of recycling and incineration are assumed to increase 
around the world, thus leading to a lower share of waste on landfills. 

MESSAGE considers also the recovery of CH, in energy and non-energy sec- 

tors. In the energy sector CH, may be captured from coal mining (through dega- 
sification systems) which is fed into the energy system. In the solid waste sector, the 
recovered CH, from landfills can be used as gas by the industrial sector or converted 
to electricity for end use. The resulting CH, emissions factors of different fossil fuels 
are shown in Extended Data Table 3. 
Overview of the REMIND model. REMIND is a global multi-regional model of 
the energy-economy-climate system spanning the period 2005-2100, with 5-year 
time steps between 2005 and 2060, and 10-year time steps thereafter. The periods 
2005 and 2010 are used for calibration purposes. The scenarios start to differ from 
2015 onwards. The world is divided into 11 regions: five individual countries (China, 
India, Japan, United States of America, and Russia) and six aggregated regions formed 
by the remaining countries (the European Union, Latin America, sub-Saharan Africa 
without South Africa, a combined Middle East/North Africa/Central Asia region, 
other Asia, and the rest of the world). 

The macro-economic core of REMIND is an intertemporal general equilibrium 
model of economic growth with perfect foresight that is solved using optimization 
methods to compute the market equilibrium with full cooperation between regions. 
This approach is similar to RICE* and MERGE™. The macro-economic production 
function takes as input capital, labour and final energy. The resulting economic 
output is then available for investments into the macro-economic capital stock as 
well as for consumption, trade of goods, and financing the energy system. Macro- 
economic consumption, exogenous population and the pure rate of time prefer- 
ence of 3% per year determine the welfare in each region. 

An overview of the REMIND model is shown in Supplementary Fig. 4. The main 
features are summarized in Supplementary Table 7. The model has been published 
in the academic literature*'* and a full model description is available online”’. 

The REMIND model participated in a number of model comparison studies*****, 
The energy sector and its sub-components have been reviewed in a number of model 
comparison studies. REMIND performed reasonably compared to the other par- 
ticipating models. REMIND showed particular strength in the sensitivity of price- 
quantity changes on fossil fuel markets and the international trade of fossil fuels”**. 
Modelling energy systems and natural gas in REMIND. The energy system is hard- 
linked to the macro-economic core via final energy demand and costs incurred by 
the energy system*. Final energy demand is represented by a production function 
constant elasticity of substitution (nested CES production function) and includes 
transport energy, electricity, and various non-electric energy types for stationary 
end uses. This means that final energy demands are price responsive depending on 
the substitution elasticities. 

The energy sector supplies final energy. The conversion of primary energy into 
secondary energy carriers as well as the distribution of secondary energy carriers to 
end-use sectors is represented by capacity stocks of more than 50 technologies in 
which costs of investment and operation and maintenance are also accounted for. 
System inertias are represented via the vintage capital structure and adjustment 
cost for accelerating capacity ramp-up. Therefore, primary energy demands are 
price elastic and depend on price-elastic final energy demands, all relative prices 
and system flexibilities in the energy sector. The price-responsive primary energy 
demands are crucial for the results of this study derived with the REMIND model. 
The effect of additional gas supplies acts on highly interdependent energy markets 
and price responsive energy demands are the main trigger for second-order effects 
due to the gas supply expansion. 

The supply side of exhaustible primary energy sources (coal, oil, gas and uranium) 
assumes cumulative extraction cost functions in each region. In addition to the 
cumulative extraction cost function the fossil fuel extraction sector distinguishes 
different grades with an upper limit of supply, specific extraction costs and decline 
rates. The intertemporal general equilibrium, therefore, reflects producer rents and 
scarcity rents of the fossil fuel extraction sector. Major subsidies for fossil fuels are 
also reflected*’. Natural gas can supply the power sector and supply gases for sta- 
tionary use reflecting the residential, the service and the industry sector. The fossil 
fuel sector'® and the nuclear power sector* are fully documented elsewhere. The 
supply of renewable primary energy comprises renewable energy potentials (bio- 
mass, hydro power, wind power, solar energy, and geothermal energy). Renewable 
energy and storage technologies feature technology learning, reducing investment 
costs with increasing installed capacity. Furthermore, the integration of fluctuating 
renewables is subject to integration costs that are implying diminishing returns 
depending on the market share. Bio-energy supply and land-use emissions are 
consistent with the land-use model MAgPIE*. 

International trade is explicitly represented assuming a world market for final 
goods and primary energy carriers (fossil fuels, uranium and bio-energy). Importers 
and exporters of primary energy have to pay trading costs, which induce regional 
price differentials. For the case of natural gas trading costs are substantial and also 
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energy losses for the transportation are considered. Trading costs are a crucial 
factor for the diverse development of regional energy systems. A major shift from 
conventional to abundant gas triggers a change of the relative geo-spatial distri- 
bution of gas endowments. In the abundant gas case the endowments are not only 
larger, but also geographically more evenly distributed, which improves domestic 
gas supply in many regions that are considered major importers in the conven- 
tional gas case. 

Modelling greenhouse gas emissions in REMIND. Finally the model calculates 
energy related CO, and non-CO) greenhouse gas (for example, CH, and N20) as 
well as aerosol emissions via time-dependent emission factors (see Supplementary 
Table 8). Regarding CH, emission factors of extraction activities, they are differ- 
entiated for fossil fuels and regions but remain constant over time (see Extended 
Data Table 3). The marginal abatement cost functions map marginal abatement 
costs to relative reduction from the baseline level and are linearly scaled with the 
activity- dependent baseline. These time-dependent marginal abatement cost curves 
are employed to inform the model about mitigation solutions that prevent CH4 
leakages and make this gas available for supply. Hence, as gas prices increase over 
time by moving to higher cost deposits, the incentive to invest in mitigation tech- 
nologies increases and so CH, emission factors are effectively reduced. The strength 
of the effect depends on the endogenous gas price. The greenhouse gas emissions 
representation in the REMIND model has been fully reported in the peer-reviewed 
literature”. 

Overview of the WITCH model. WITCH” is a dynamic global model that inte- 
grates the most important elements of climate change in a unified framework. The 
economy is modelled through an inter-temporal optimal growth model which cap- 
tures the long-term economic growth dynamics. A compact representation of the 
energy sector is fully integrated (hard linked) with the rest of the economy so that 
energy investments and resources are chosen optimally, together with the other 
macroeconomic variables. Land-use mitigation options are available through a soft 
link with a land-use and forestry model (GLOBIOM)”*. Emission scenarios are pro- 
cessed through a simple climate model calibrated to MAGICC6” to compute future 
climate outcomes. Climate change impacts on the economic output are captured 
through a damage function, accounting for implicit adaptation decisions. Explicit 
investment in additional adaptation efforts can reduce the damages associated with 
temperature change. Feedback loops between economy and climate are thus fully 
integrated in WITCH to simulate the intertemporal trade-offs between costs of 
climate change mitigation, adaptation, and residual damages. 

WITCH represents the world in a number (currently 13) of representative native 

regions (or coalitions of regions); as shown in Supplementary Fig. 5, for each it 
generates optimal mitigation and adaptation strategies for the long term (2005 to 
2100), as a result of a maximization process in which the welfare of each region (or 
coalition of regions) is chosen strategically and simultaneously to other regions. 
This makes it possible to capture regional free-riding behaviours and strategic inter- 
action induced by the presence of global externalities. In this game-theory set-up, 
regional strategic actions interrelate through greenhouse gas emissions, dependence 
on exhaustible natural resources, trade of oil and carbon permits, and technol- 
ogy research and development. The endogenous representation of research-and- 
development diffusion and innovation processes constitutes a distinguishing feature 
of WITCH. This approach gives the possibility to explore how research-and- 
development investments in energy efficiency and carbon-free technologies integrate 
the currently available mitigation options. The model features multiple externalities, 
both on the climate and the innovation side. The technology externality is mod- 
elled via international spillovers of knowledge and experience across countries and 
time?!”’, This formulation of technical change affects both decarbonization as well 
as energy savings. Supplementary Table 9 provides an overview of the other key 
features of WITCH. 
Modelling energy system and natural gas in WITCH. In WITCH, the energy 
sector is fully integrated with the rest of the economy. It is distinguished in an elec- 
tric sector, a transportation sector, and an aggregated non-electric (industry and 
residential) sectors. The energy sector is described by a production function that 
aggregates different factors at various levels and with associated elasticities of sub- 
stitution. All the main energy carriers and technologies are included. 

Natural gas is used in the industry and residential sector as well as for generating 
electricity. Gas power is available with and without carbon capture and storage. 
WITCH also tracks CH, emitted in the non-energy sector. The marginal price of 
natural gas, along with the other energy carriers, is determined by cumulative global 
extraction and available resources. Natural gas is traded among the 13 regions, which 
can buy or sell it from a common pool. Bilateral trade across each region couple is not 
accounted for. This requires the modeller to vet the trade pattern results carefully 
when modelling regionally heterogeneous effects. However, this poses little prob- 
lem ina world of abundant gas availability, where the global gas market is expected 
to be more integrated and the role of bilateral contracts to be less pronounced. 
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Modelling greenhouse gas emissions in WITCH. The model generates the 
greenhouse gases reported in Supplementary Table 10, either directly or via exo- 
genous assumptions. Mitigation can happen through technology substitution or 
storage, direct reduction via marginal abatement cost curves or end of pipe via emis- 
sion factors. CO, emission factors are reported in Extended Data Table 3. Emission 
trajectories are processed through the MAGICC6 climate model, which calculates 
the climate outcome. 

Overview of the MAGICC model. Throughout the analysis we use MAGICC6 for 
simulating radiative forcing and temperature change. MAGICCisa simple carbon- 
cycle climate model originally developed by Wigley and Raper®*”’. The version 6 is 
updated to emulate the simulations from large-scale climate models and carbon- 
cycle models as represented in the Coupled Model Intercomparison Project 3 
(CMIP3)*° and Coupled Climate Carbon Cycle Model Intercomparison Project 
(C4MIP)**. See ref. 22 for the documentation of the calibration process. 

MAGICC has been traditionally used in the [AMs, and most prominently in the 
development of the RCPs”*”*. The RCP scenarios in turn were used in large-scale 
climate models to simulate the future climate in the IPCC Fifth Assessment Report”. 

Although a simple climate model like MAGICC is by no means a sufficient sub- 
stitute for the large-scale climate models, its careful calibration to the large-scale 
climate models and validation exercises ensure the direction and the magnitude of 
impact is consistent with the current scientific understanding of the climate sys- 
tem. With its flexible structure and fast runtime, MAGICC can be readily inte- 
grated into [AMs. Such a smaller computational burden allows IAMs to simulate 
more future scenarios without needing a supercomputer to run large-scale climate 
model. 

For this study, we have processed all emission trajectories from the IAMs through 
MAGICC6 to obtain radiative forcing and temperature change. MAGICC allows 
emulation of a number of climate models. This study uses the default setting used 
for RCP analysis. The RCP default setting uses median estimates from the climate 
model inter-comparison exercises CMIP3* and C4MIP™. For an emulation of 
carbon cycle model, the C4MIP Bern-CC model” was chosen as it represents the 
middle of range C4MIP results, and for climate sensitivity 3 °C is used. The full 
documentation of the RCP default setting is available in ref. 96. 

The coverage of greenhouses gases and other forcing agents differs widely across 
five models. The range of forcing agents covered by each model is available in Sup- 
plementary Tables 2, 4, 6, 8, and 10. All models endogenously model CO). All but 
WITCH endogenously model CH, and N,O. GCAM, MESSAGE, and REMIND 
endogenously model aerosols and other short-lived species. For the minor forcing 
agents that the models do not endogenously simulate, we have used exogenous 
emissions trajectories from the RCP8.5 scenario” because it best approximates our 
baseline scenario. Forcing the secondary effect of emissions, such as indirect cloud 
formation or atmospheric chemistry feedback from non-methane hydrocarbons 
and other reactive gases, is modelled natively in MAGICC”’. 
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Extended Data Figure 1 | Radiative forcing composition for high fugitive 
methane scenarios. a, Year 2010 and year 2050 composition of radiative 
forcing for the Conventional Gas scenario with high fugitive methane for five 


models. b, Year 2050 relative difference in radiative forcing (the Abundant Gas 
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scenario minus the Conventional Gas scenario) all with high fugitive methane 
assumption for the five models. 1% difference in forcing for model average is 
equivalent to 0.044Wm ~. 
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Extended Data Figure 2 | Global natural gas supply curves. The current natural gas supply curves provided by Global Energy Assessment’”. Future cost 
reduction assumptions are documented in the Methods. 
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Extended Data Figure 3 | Natural gas supply curve sensitivity analysis. 
a, Global natural gas consumption. b, CO, emissions from fossil fuels. 


c, Total radiative forcing. d, Global mean surface temperature change (from fraction of cost reduction over 2010-2050. 
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Extended Data Figure 4 | Uncertainty ranges in principal components of 
model projections. a, Global population. b, Global GDP. c, Total primary 


energy consumption. d, Fossil fuel and industrial CO, emissions. Coloured 
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lines are model reported values from this study. Shaded areas are ranges of 
projections found in the literature obtained from the IPCC ARS database”. 
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Extended Data Table 1 | Cost reduction in low-carbon energy technologies over 2010-2050 in the Abundant Gas scenario 


BAEGEM GCAM 
Solar Photovoltaics 36% 63% 
Wind Turbine 37% 21% 
Nuclear Powerplant 29% 15% 


REMIND WITCH units 
21% 53% % 
13% 40% % 

0% 7% % 


PV, photovoltaics. 
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Extended Data Table 2 | CO2 emissions in 2050 from fossil fuels and industry with standard energy market assumptions and with the coal- 
substitution-only assumption 


BAEGEM GCAM MESSAGE REMIND WITCH units 

Conventional Gas 48.5 63.1 63.4 61.7 63.8 GtCO, 

Standard = Abundant Gas 54.0 62.1 62.9 64.6 63.9  GICO, 
Difference 5.5 -1.1 -0.5 2.9 0.0 GtCO, 

Coal Conventional Gas 48.5 63.1 63.4 61.7 63.8 GtCOz 
= aaa Abundant Gas 51.9 60.4 59.7 60.4 63.8  GICO, 
Difference 3.3 -2.8 -3.8 -1.3 -0.0 GtCO, 
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Extended Data Table 3 | 2050 emission factors for fossil fuels in each model 


BAEGEM GCAM MESSAGE REMIND WITCH units 

Coal 101 100 95-101 96 90  kgCO,/GJ 

CO2 Oil 79 72 73 68 70 kgCO,/GJ 
Gas 59 52 56 56 55 kgCO,/GJ 

Coal 0.21 0.14 0.39 0.12 N/A kgCH,/GJ 

Gry Oil 0.11 0.06 0.06 0.06 N/A kgCH,/GJ 
Gas 0.32 0.35 0.31 0.52 N/A kgCH,/GJ 


COz emission factors specify the average carbon content of the fuel. CH, emission factors specify average fugitive methane emissions associated with production and transportation of each fossil fuel reported for 


the Abundant Gas scenario. 
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Extended Data Table 4 | 2050 anthropogenic radiative forcing with standard fugitive methane emission assumptions and with high fugitive 
methane emission assumptions 


BAEGEM GCAM MESSAGE REMIND WITCH units 

Conventional Gas 3.97 4.46 4.25 4.16 4.38 Wm? 

Standard = abundant Gas 4.07 4.49 4.37 4.46 4.37 W m2 
Difference 0.10 0.02 0.12 0.31 0.01 W m2 

High Conventional Gas 4.20 4.58 4.51 4.32 4.49 Wm? 
rrgtive Abundant Gas 4.44 4.69 4.83 4.85 4.50 W m2 
Difference 0.24 0.11 0.32 0.53 0.01 W m? 
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Producing more grain with lower environmental costs 


Xinping Chen'*, Zhenling Cui'*, Mingsheng Fan', Peter Vitousek”, Ming Zhao*, Wenqi Ma‘, Zhenlin Wang”, Weijian Zhang’, 
Xiaoyuan Yan°, Jianchang Yang’, Xiping Deng®, Qiang Gao’, Qiang Zhang"®, Shiwei Guo", Jun Ren”, Shiqing Li®, Youliang Ye’, 
Zhaohui Wang", Jianliang Huang’, Qiyuan Tang'®, Yixiang Sun!”, Xianlong Peng'®, Jiwang Zhang”, Mingrong He®, Yunji Zhu’, 
Jiquan Xue", Guiliang Wang', Liang Wu', Ning An!, Liangquan Wut, Lin Ma!, Weifeng Zhang! & Fusuo Zhang" 


Agriculture faces great challenges to ensure global food security by 
increasing yields while reducing environmental costs’”. Here we 
address this challenge by conducting a total of 153 site-year field exper- 
iments covering the main agro-ecological areas for rice, wheat and 
maize production in China. A set of integrated soil-crop system man- 
agement practices based on a modern understanding of crop ecophys- 
iology and soil biogeochemistry increases average yields for rice, wheat 
and maize from 7.2 million grams per hectare (Mg ha _'), 7.2 Mgha ' 
and 10.5 Mgha ‘ to 8.5Mgha_', 8.9Mgha ° and 14.2 Mgha“', 
respectively, without any increase in nitrogen fertilizer. Model sim- 
ulation and life-cycle assessment’ show that reactive nitrogen losses 
and greenhouse gas emissions are reduced substantially by integrated 
soil-crop system management. If farmers in China could achieve 
average grain yields equivalent to 80% of this treatment by 2030, over 
the same planting area as in 2012, total production of rice, wheat and 
maize in China would be more than enough to meet the demand for 
direct human consumption and a substantially increased demand 
for animal feed, while decreasing the environmental costs of inten- 
sive agriculture. 

Global agriculture is facing unprecedented challenges and risks. Rates 
of yield growth have slowed since the 1980s (ref. 4), and even stagnated 
in many areas’. Meanwhile, agriculture incurs substantial environmental 
costs, including emissions of greenhouse gases’, loss of biodiversity’, and 
degradation of land and freshwater'®’’. These challenges may grow in 
the future, because global food demand is likely to double by 2050 (reflect- 
ing both population growth and increased consumption of animal pro- 
tein) against a backdrop ofa changing climate and growing competition 
for land, water, labour and energy. The human and environmental costs 
of expanding agricultural lands are such that most of the necessary pro- 
duction gains must be achieved on existing farmland’. Can the necessary 
increase in yields be accomplished? If so, can the environmental costs of 
intensive agriculture be mitigated? 

Weaddressed these questions through quantitative field experiments 
under large-scale agro-ecological conditions. Our experiments included 
the three main staple crops (rice, wheat and maize), which together account 
for most global cereal production’*”’, in the main agro-ecological areas 
of China. We focus on China in part because the yields of these crops 
already are relatively high there, thanks to ‘green revolution’ technolo- 
gies, and in part because China must address the joint challenges of pro- 
duction and environmental degradation expeditiously”. 

From 2009 to 2012, we conducted a total of 153 site-year field exper- 
iments (Extended Data Fig. 1). In each experiment four treatments were 
employed: (1) current practice (the farmers’ practice in the region but 


conducted in experimental plots); (2) improved practice (which modified 
current practice to offset the major limitations to crop growth); (3) high- 
yielding (which maximized yields without regard to costs); and (4) inte- 
grated soil-crop system management (ISSM, which used advanced crop 
and nutrient management). ISSM redesigned the whole production sys- 
tem based on the local environment, drawing upon appropriate crop 
varieties, sowing dates, densities and advanced nutrient management. 
The ISSM concept had been developed for maize systems"*; we applied 
it to a broad range of field situations for wheat and rice in addition to 
maize. The challenge of increasing yield while reducing environmental 
costs is greater for tiller crops such as rice and wheat because they change 
in population structure within crop growing seasons (Supplementary 
Discussion). In addition to our experiments, we determined yields and 
nitrogen use in 18,938 farmers’ fields in the main cereal production areas 
of China (Extended Data Table 1). 

Our highest yields were achieved in high-yielding treatments, with 8.8, 
9.2 and 14.4 Mg ha ~ ' for rice, wheat and maize, respectively (Table 1). 
These yields are comparable to yield potentials in the areas with the 
most favourable conditions and intensive agronomic management glob- 
ally: rice in California (USA) (~9 Mg ha ~ ") (ref. 5), wheat in Germany 
(9.5 Mg ha_'), and rainfed and irrigated maize in the USA (13.2 and 
15.1 Mgha~ ) (ref. 16). The ISSM treatment achieved 97-99% of the yields 
in the high-yielding treatments, and the improved practice treatments 
achieved 88-92% of high-yielding yields; all increased yields (Table 1) 
and nitrogen uptakes were significantly greater than the current prac- 
tice treatment (Extended Data Table 2). 

Nitrogen fertilizer application rates were greatest in the high-yielding 
treatment, and decreased in the order current practice then ISSM then 
improved practice (Table 1). High nitrogen surplus (nitrogen fertilizer 
applied in excess of uptake by crops) and low nitrogen use efficiency 
(PFPx, nitrogen partial factor productivity, in kilograms of grain per 
kilogram of nitrogen applied) occurred in high-yielding (owing to high 
nitrogen application) and current practice (owing to low grain yield) treat- 
ments, indicating the inefficiency and environmental damage associated 
with both conventional practices and with attempts to increase yields 
simply by increasing inputs’’. Compared with current practice, the nitro- 
gen rates in improved practice and ISSM decreased slightly even as yields 
increased substantially. 

In the improved practice and ISSM treatments, nitrogen surplus was 
around zero with only a small range from —9 to 16 kg Nha‘, and PFPy 
reached 54-57, 41-44 and 56-59 kgNkg! for rice, wheat and maize, 
respectively (Table 1). These nitrogen use efficiencies are comparable to 
those of most ‘ecologically intensive’ systems worldwide’*. 
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Table 1 | Grain yield, nitrogen application rate, PFP, and nitrogen surplus for rice (n = 57), wheat (n = 40) and maize systems (n = 56) for the 
four management treatments in field experiments compared with farmers’ practice from a total of 18,938 farmers 


Crops Treatment Yield (Mg ha +) N rate (kg N ha?) PFPy (kg kg” +) N surplus (kg N ha~+) 
Rice 
Current practice 722117 181* 418 5S* 
Improved practice 8.14114 146t 57* 7t 
High-yielding system 8.8 + 1.2* 192* 47% 38+ 
ISSM 8.5+1.2*+ 162+ 54+ 16 
Farmers’ practice (n = 6,592) 7.0+1.5 209 41 82 
Wheat 
Current practice 7.2+14t 257} 28t 74* 
Improved practice 8.3 + 1.7+ 1928 44* —9F 
High-yielding system 9,2 + 1,5* 283* 33+ 50* 
ISSM 8.9 +1.7*4 220t 41* 2 
Farmers’ practice (n = 6,940) 57213 210 33 74 
Maize 
Current practice 10.5 + 1.6¢ 266+ 40+ 12% 
Improved practice 12.6 +2.2+ 214t 59* —8t 
High-yielding system 144+24* 402* 37+ 140* 
ISSM 14.2 + 2.6* 256+ 56* 8t 
Farmers’ practice (n = 5,406) 7641.5 220 43 TZ 


Means + s.d. for yield. Least significant difference testing was performed among the four experimental treatments for each crop, the same footnote symbol(s) within each column are not significantly different at 


P<0.05. 


Reactive nitrogen losses and greenhouse gas (GHG) emissions from 
agriculture contribute substantially to atmospheric and water pollution 
in China and elsewhere. Using established empirical models'*”° (Extended 
Data Figs 2-4 and Supplementary Discussion) and life-cycle assessment 
methods*””, we evaluated total reactive nitrogen losses and GHG emis- 
sions per unit area (expressed as kilograms of nitrogen per hectare or 
kilograms of carbon dioxide equivalents per hectare), and their intensity 
per unit grain yield (expressed as kilograms of nitrogen losses or carbon 
dioxide equivalents per million grams). Total reactive nitrogen losses 
and GHG emissions both for improved practice and ISSM treatments 
decreased compared with current practice, while high-yielding signifi- 
cantly increased total reactive nitrogen losses and GHG emissions (except 
GHG emissions in rice systems) (Fig. 1). 

Reactive nitrogen losses in maize systems were higher than those in 
wheat and rice systems (Fig. 1), mainly because of high nitrate leaching 
and ammonia volatilization in maize’s summer growing season. The total 
GHG emissions from rice were highest because of high methane (CH,4) 
emissions (Fig. 1). Nitrogen fertilizer production, transportation and appli- 
cation contributed substantially to the difference in total GHG emissions 
among treatments, especially for wheat and maize. 

These large gains in grain yield for maize, wheat and rice demonstrate 
a substantial potential to meet food demand on existing farmland. While 
previous calculations based on historic yield trends suggested strong 
constraints—for example, suggesting that Chinese rice yields reached 
a plateau of ~6.4 Mgha_' by the mid-1990s (ref. 5)—our results dem- 
onstrate that this plateau does not represent a biophysical yield limita- 
tion (yield ceiling). We suggest that socio-economic factors—particularly 
extremely small farm sizes and urbanization leading to an increase in the 
proportion of part-time farmers—contribute to the observed yield plat- 
eau (Extended Data Fig. 5). These socio-economic factors could diminish 
with economic development and changes in land tenure or management 
arrangements (Supplementary Discussion). More generally, while pre- 
vious calculations based on historic yield trends suggested that crop yields 
have reached a plateau in much of the world*”, our large-scale experi- 
mental results demonstrate that this suggestion requires careful testing. 

Equally importantly, our experiments demonstrate that substantially 
increased yields can be produced with lower inputs of nitrogen fertilizer, 
and so lower human and environmental costs (Fig. 2). Our survey of 
farmers also shows that it is possible to achieve high yields in practice, 
because we found that about 20% and 5% of rice and wheat farmers, 
respectively, report yields already close to ISSM yields without using 
excessive fertilizer (Extended Data Table 3). Even so, there is room for 
further improvement: GHG emission intensities in ISSM (our best treat- 
ment) are still higher than published results in other intensive agricul- 
tural regions, such as maize in the USA (231 kg CO, eq Mg ' of grain, 
and 13.2 Mg grain ha ')°, mainly because of coal-based nitrogen fertilizer 
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Figure 1 | Reactive nitrogen losses and GHG emissions for four 
management treatments, based on empirical models of losses and life-cycle 
assessment. a-c, Reactive nitrogen losses (a, rice; b, wheat; c, maize) include 
N,O emission, nitrogen leaching, NH; volatilization and nitrogen runoff. 
d-f, GHG emissions (d, rice; e, wheat; f, maize) include those from nitrogen 
fertilizer application, nitrogen fertilizer production and transportation, other 
sources (phosphorus and potassium fertilizer; crop management) and CH, 
emission (in rice). CP, current practice; IP, improved practice; HY, high- 
yielding system. Means followed by the same footnote symbol(s) for each crop 
are not significantly different at P< 0.05. 
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Figure 2 | Substantially increased yields can be produced with lower inputs 
of nitrogen fertilizer, and so lower human and environmental costs. The 
intensity of land use (a), nitrogen use (b), reactive nitrogen losses (c) and GHG 
emissions (d) needed to produce 1 Mg of grain, for three crops and four 
management treatments. Means followed by the same footnote symbol(s) for 
each crop are not significantly different at P< 0.05. 


production in China”. Improved fertilizer production technology”! and 
innovative fertilizer products” could play further roles in mitigating 
GHG emissions. 

Current yields and cropping areas across China combine to produce 
204, 121 and 206 Mt of rice, wheat and maize annually’, with 74% of 
maize fed to livestock (with 5 Mt of imported maize, and 58 Mt of imported 
soybean). With population and economic growth, demand for grain 
in China is expected to reach 218, 125 and 315 Mt of rice, wheat and 
maize by 2030, by which time China’s population is expected to have 
stabilized. If farmers could achieve grain yields of 80% of the yield level 
in our ISSM treatment by 2030, using the same planting area as in 2012, 
total production of rice, wheat and maize would reach 216, 174 and 397 Mt; 
this is enough to meet the demand for direct human consumption and 
domestically produced animal feed. Such yields would even suffice to 
offset imports of animal feed (Fig. 3), while reducing nitrogen use, reac- 
tive nitrogen losses and GHG emissions by 21%, 30% and 11% respec- 
tively, compared with current levels (scenario 2 in Extended Data Table 4). 
Further, if we simply reach the projected demand in 2030 with 80% of 
ISSM yields, then reactive nitrogen losses and GHG emission could be 
reduced by 48% and 26%, and the land and nitrogen fertilizer used for 
these three crops could also be reduced by 22% and 33% (scenario 3 in 
Extended Data Table 4). This change could contribute to the produc- 
tion of other crops and to the protection of natural ecosystems. Also, a 
relative shift to maize will reduce agricultural demand for water in China™. 
However, if larger quantities of more sustainably produced grain are 
allocated to an inefficient animal production system, overall benefits 
will be reduced substantially*’. Increasing the efficiency and mitigating 
the environmental/human costs of livestock production systems in China 
deserves more attention. 

The gains in yield and environmental quality that can be achieved 
through an integrated agronomic approach are striking—especially given 
that yields in China are already higher than those in most developing 
countries. The ISSM approach is agronomically robust and relatively 
easy and inexpensive to adopt (Supplementary Discussion), although 
the management practices employed for ISSM vary across different crops 
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Figure 3 | The projected demand of grain production for 2030 in China. 
a, Rice; b, wheat; c, maize. Red bars, crop production from 2005 to 2012. Black 
circles, projected demand in 2030. Black triangles, increasing grain yield by 
the trend observed from 2005 to 2012, keeping planting area the same as in 
2012. Green circles, grain yields that reach 80% of the level observed in our 
ISSM treatment, over the same planting area as in 2012. Note differences in 
scale for the different crops. 


and different regions. We believe that this approach can be applied 
elsewhere—and that it should be possible to meet global food demand 
with more sustainable intensive agriculture on existing cropland, thereby 
sustaining other natural resources by avoiding the conversion of forest, 
grassland and marginal lands to agriculture and supporting other eco- 
system services such as wetland preservation, wildlife conservation, carbon 
sequestration, etc. These benefits are achievable if we invest in agronomic 
research that incorporates an ecosystem perspective, if the effort is pur- 
sued across disciplinary and institutional boundaries, and if we provide 
the technologies, arrangements and incentives that make it viable for farm- 
ers to adapt and adopt more knowledge-intensive forms of agriculture. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Field experiments. A total 153 site-years of field experiments were conducted from 
2009 to 2012 within the main agro-ecological areas for rice (n = 57), wheat (n = 40) 
and maize (n = 56) production in China (Extended Data Fig. 1). Four treatments 
were designed and compared: (1) current practice, which followed farmers’ prac- 
tices in the region but was conducted in experimental plots; (2) improved practice, 
which was based on improving farmers’ practices beginning with an analysis of 
limiting factors, followed by implementing key new technologies, mostly through 
using root zone nutrient management” to improve nutrient use efficiency, together 
with known agronomic management practices (that is, increasing planting density) 
to increase yield; (3) high-yielding, designed to test yield potential, where crop yields 
were maximized through inputs so that they made full use of solar radiation and the 
period with favourable temperatures, without considering the costs of various inputs; 
(4) ISSM, which redesigned cropping systems using advanced crop and nutrient 
management to bring yields closer to their biophysical potential, while optimizing 
various resource inputs (that is, nutrient and water) and minimizing environmental 
costs, based on an understanding of crop ecophysiology (for example, crop canopy, 
solar radiation use and dry matter accumulation), physiological nutrient demands 
by high-yielding crop and the biogeochemical processes relating to nutrient avail- 
ability and loss'*. 

A randomized complete block design with four replications was used for each 
experiment. At maturity, grain yield and above-ground biomass were sampled and 
measured in each plot, with 6 m/” for wheat and rice, and 10 m? for maize. Their 
nitrogen concentrations were determined using the Kjeldahl procedure. Fertilizer 
and pesticide use as well as energy use for irrigation and soil tillage in each treat- 
ment were recorded. 

Survey of farmers. Representative farmers were selected for a face-to-face, questionnaire- 
based household survey conducted between 2007 and 2009. A total of 6,592 farmers 
(55 counties in 21 provinces), 6,940 farmers (113 counties in 18 provinces) and 5,406 
farmers (66 counties in 22 provinces) were surveyed for rice, wheat and maize in 
China, respectively. In each province, three to seven counties were randomly selected, 
and three townships were randomly selected in each county, then two to five villages 
were randomly selected in each township, and finally about 20 farmers from a village 
were randomly surveyed to collect information on fertilizer use and grain yield in 
each farmer’s household (Extended Data Table 1). All of these in-house surveys 
were conducted by professional research staff. Before beginning the survey, an informed 
consent information sheet was given to each farmer to read (or in some cases was 
read to the farmer), and verbal informed consent was requested. 

Data sources for establishing models of reactive nitrogen losses. An exhaustive 
literature survey of peer-reviewed publications was undertaken using the ISI Web 
of Science (Thomson Reuters), Google Scholar (Google) and the China Knowledge 
Resource Integrated (CNKI) database, to identify articles published before December 
2013. The literature survey focused on field measurements of nitrogen losses in the 
major Chinese agricultural regions, including NH; volatilization, nitrogen leach- 
ing, NO emissions and nitrogen runoff. Studies had to meet specific criteria to be 
included in the data set. First, nitrogen losses must have been measured both dur- 
ing field operations and throughout the entire growing season. Second, NH; vola- 
tilization must have been measured using either the micrometeorological method” 
or the wind tunnel method” within at least 2 weeks after nitrogen fertilization. The 
NO emissions must have been measured using the static chamber technique”, daily 
for 7-10 days after nitrogen fertilization and for 3-10 days after other events that 
may have triggered N,O gas emissions such as rainfall, irrigation or tillage, as well 
as weekly or biweekly during the remaining periods; and nitrogen leaching must 
have been measured using the suction cap or lysimeter method” or the soil sample 
method’. Third, only studies that reported crop yields were included. Based on the 
literature survey, the final data set consisted of 134 published references and 787 
observations (Supplementary Information, extended reference list). 

Reactive nitrogen losses and GHG emission calculations. Using extensive and 
localized databases, reactive nitrogen loss models were developed based on the rela- 
tionships between N2O emission, nitrogen leaching, runoff or NH; volatilization, 
and nitrogen application rate or nitrogen surplus’*”’. These relationships were sub- 
jected to linear or exponential regression analysis to identify the best-fit curves. Cor- 
rected R* values were used for model selection in addition to visual inspection of 
each response curve type. The results revealed an exponential relationship between 
the nitrogen surplus and direct N,O emissions, nitrogen leaching and runoff, while 
NH; volatilization was linearly correlated with the rate of nitrogen fertilizer appli- 
cation (Extended Data Figs 2-4). 

Nitrogen surplus was defined as nitrogen application minus above-ground nitrogen 
uptake. Nitrogen uptake in field experiments was calculated by measured nitrogen 
concentration multiplied by measured biomass, and in the survey of farmers it was 
calculated by reported yield multiplied by the parameters of nitrogen required to 
produce a unit of grain****. Based on the established reactive nitrogen loss models, 
we calculated the amount of reactive nitrogen lost to the environment, expressed 


as kilograms of nitrogen per hectare, and the reactive nitrogen loss intensity (reac- 
tive nitrogen losses per unit grain yield), expressed as kilograms of nitrogen per million 
grams. 

The total GHG emissions, including CO,, CH, and N,O during the whole life 
cycle of crop production, consisted of three components: (1) those during nitrogen 
fertilizer application, including direct and indirect N,O emissions, which can be 
calculated based on the empirical reactive nitrogen losses model mentioned above; 
(2) those during nitrogen fertilizer production and transportation; and (3) those dur- 
ing the production and transportation of phosphorus and potassium fertilizer and 
pesticides to the farm gate, and diesel fuel use in farming operations such as sowing, 
tillage and harvesting. 

Total NO emissions resulting from anthropogenic nitrogen inputs to agricul- 
tural soils occur through a direct pathway (that is, directly from the soils to which the 
nitrogen is added), and through two indirect pathways, via the volatilization of com- 
pounds such as NH; and NO, with subsequent re-deposition downwind, and N,O 
emission there, and through leaching and runoff and subsequent NO emission down- 
stream. Indirect N,O emissions can be estimated following the IPCC methodology”, 
whereby 1% and 0.75% of the volatilized NH3-N and leached NO3-N are lost as 
N2O-N, respectively. For rice, the impact of the CH, emissions was calculated as 
carbon dioxide equivalents with 209 and 65 kg ha! of emissions for single rice in 
south and northeast China, respectively, and 245 and 323 kg ha’ for early rice and 
late rice in the double rice system, respectively**. The 100-year global warming poten- 
tials of CH, and N2O are 25 and 298 times the intensity of CO, on a mass basis, 
respectively’’. The soil CO, flux was not included as a contribution to global warm- 
ing potential in our analysis: the net flux is much less than gross CO? emissions that 
can be measured, and net fluxes have been estimated to contribute less than 1% to 
the global warming potential of agriculture on a global scale*’. The change of soil 
organic carbon content was also not included in our analysis, because it was diffi- 
cult to detect small changes in the short time our experiments were in place. 

System boundaries were set as the periods of the life cycle from the production 
of inputs (such as fertilizers and pesticides), delivery of the inputs to the farm gates, 
farming operations and the crop harvesting period. Using the emission factors for 
all agricultural inputs given in Supplementary Table 1, we calculated total global warm- 
ing potential per unit area, expressed as kilograms of carbon dioxide equivalents 
per hectare, and the GHG intensity, expressed as kilograms of carbon dioxide equiv- 
alents per million grams of grain. 

Projection of food demand for China in 2030. The human population of China 
is projected to reach a peak of 1.47 billion around 2030, and the diet structure will 
change to more animal-derived protein with the development of urbanization (urban- 
ization is projected to reach 80% in 2030)”*. Using a nutrient flows in food chains, 
environment and resource (NUFERNUFER) model”, we project that the demand 
for rice, wheat and maize in 2030 for China will be 218, 125 and 315 Mt, respectively, 
for a total of 658 Mt for the three crops. Demand for animal feed is expected to 
include 308 Mt of maize and another 50 Mt of soybean. 

Data analysis. For all field experiments, data analysis used one-way analysis of var- 
iance in SAS**. The means of management treatments were compared using least 
significant difference at a 0.05 level of significance for grain yield, nitrogen applica- 
tion, PFPy, nitrogen surplus, reactive nitrogen losses and GHG emissions. 
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Extended Data Figure 1 | The distribution of experiments for grain from 
2009 to 2012 in China. a, Rice (n = 57); b, wheat (n = 40); c, maize (n = 56). 


green means a larger density of planting area regionally for that crop. The dots 
The background green colour represents the planting area for each crop; darker 


represent sites, and each colour in a dot represents a year of measurements. 
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Extended Data Figure 2 | Linear models of NH; volatilization based on (c) maize (n = 29) (Supplementary Information, extended references 37-60 for 
nitrogen application rate. Rate of nitrogen fertilizer application was wheat and maize) growing seasons, respectively. **P = 0.01. Filled and hollow 
plotted against NH3-N volatilization for (a) rice (n = 265) (Supplementary circles represent data from Chinese journals (or theses) and ISI journals, 
Information, extended references 1-36 for rice), (b) wheat (n = 34) and respectively. 
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Extended Data Figure 3 | Exponential models of NO emissions and (Supplementary information, extended references 7, 100-113 for rice), 


nitrogen leaching based on nitrogen surplus. Nitrogen surplus was plotted —_(e) wheat (n = 59) and (f) maize (n = 56) (Supplementary information, 
against N,O-N emissions for (a) rice (n = 118) (Supplementary information, — extended references 44, 114-121 for wheat and maize). Nitrogen surplus was 
extended references 7, 36, 61-84 for rice), (b) wheat (n = 40) and (c) maize defined as nitrogen application rate minus above-ground nitrogen uptake. 

(n = 48) growing seasons (Supplementary information, extended references **Regression significant at P< 0.01. Solid and hollow circles represent data 
85-99 for wheat and maize), and against nitrogen leaching for (d) rice(m =52) from Chinese journals (or theses) and ISI journals, respectively. 
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Extended Data Figure 4 | Exponential model of nitrogen runoff based on 
nitrogen surplus for rice production. Nitrogen surplus was defined as 
nitrogen application rate minus above-ground nitrogen uptake (n = 81) 
(Supplementary information, extended references 8, 104, 122-134). 

**P <0).01. Solid and hollow circles represent data from Chinese journals 
(or theses) and ISI journals, respectively. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


10 
9 
8 
oe 
co 
mo 6 
2 
o 5 
2 
> 4 
oO 
oO 
Yr 34 
2 @ large-sacle farm 
1 © Smallholder farm 
0 
1982 1987 1992 1997 2002 2007 2012 


Year 


Extended Data Figure 5 | Rice yields over time in smallholder and large- 
scale farms in Heilongjiang province from 1982 to 2011. Large-scale farms 
were around 20 ha; smallholder farms were less than 2 ha. Data from refs 39, 40. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 1 | Grain yields, nitrogen application rates, calculated PFPy, nitrogen surplus, the total and the intensity of reactive 
emissions in farmers’ fields for rice (n = 6,592), wheat (n = 6,940) and maize (n = 5,406) in China 


nitrogen losses and GH 


Grain Yield N rate PFPy N surplus Nr losses Nr intensity GHGemission GHG emission intensity 
Crops 
Mg ha’! kgNha' = kg kg” kg Nha’ kg N Mg" kg Nha’ kg COzegha” kg CO,eqMg" 
7.0 209 41 82 66 9.9 10,343 1,574 
Rice 
(4.5-9.8) (86-412) (16-83) (-46-280) (28-142) (3.7-21) (4,365-15,465)  (580-2,618) 
5.7 210 33 74 65 12 3,707 671 
Wheat 
(3.8-7.5) (81-360) = (15-67) (-49-223)_— (19-147) (2.3-27) (2,203-5,766) (368-1117) 
7.6 220 43 72 120 17 4,436 621 
Maize 
(5.4-10.5) (85-413) (17-90) (-66-256) (36-274) (4.6-39) (2,273-8,269) (287-1,179) 


Values are mean and range (from the 5th to 95th percentiles). 
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Extended Data Table 2 | Above-ground biomass, harvest index (HI) 
and crop nitrogen uptake for rice (n = 57), wheat (n = 40) and maize 
(n = 56) in field experiments with four management treatments 


Biomass Crop N uptake 
Crops Treatment HI 
Mghz? kg Nha? 
CP 11.77 0.52* 123{ 
IP 12.8 *¥ 0.54* 138f 
Rice 
HY 14.1* 0.54* 155* 
ISSM 13.4*F 0.54* 147 *F 
cP 13.4] 0.467 1837 
IP 14.84f 0.48 #7 2017 
Wheat 
HY 16.7* 0.48 #7 234* 
ISSM 15.8 * 0.49* 218 *F 
cP 18.45 0.497 1947 
IP 20.8F 0.52* 222T 
Maize 
HY .7* 0.52* 261 * 
ISSM 23.3* 0.52 * 249 *F 


Means followed by the same letter(s) within each column for each crop are not significantly different at 
P<0.05. 
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Extended Data Table 3 | Yield and nitrogen rates of farmer average, 
top farmers and ISSM 


Crop Yield (Mgha") N rate (kg N ha") 
Rice ‘Farmers average 7.0 209 
Top 20% yield of farmers 8.6 159 
ISSM 8.5 162 
Wheat Farmers average 5.7 210 
Top 5% yield of farmers 84 234 
ISSM 8.9 220 
Maize Farmers average 7.6 220 
Top 5% yield of farmers 11.3 229 
ISSM 14.2 256 
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Extended Data Table 4 | Total production, weighted average of grain 
yield and nitrogen rate, total land use, nitrogen fertilizer use, reactive 
nitrogen losses, and GHG emissions in 2005, 2012 and projected in 
2030 under three scenarios for all three crops (rice, wheat and 
maize) in China 


2030 
Unit 2005 2012 

$1 $2 $3 
Production Mt 417 531 656 786 658 
Gram yield Mgha? 34 590 = o74 88 946 
Nrate kg Nha? 213.217) -213s«sd1722—~—Ss«188 
Land use Million ha 78 89 89 89 69 
Nuse Mt 166 194 190 154 129 
Nrlosses Mt 6.7 79 (83 5.5 0 (41 
GHGemission MtCOreq 500 «6558 «65420 «6498S 411 


Scenario 1 (S1): ‘business as usual’, grain yield increased by the trend with the recent 8 years (from 
2005 to 2012) and nitrogen application rate the same as 2012 (there was almost no change from 2005 
to 2012). Grain yields were 6.8, 5.0 and 5.9 Mgha 1, planting areas were 30.1, 24.3 and 35.0 Mha (ref. 
23), and nitrogen application rates were 209, 210, and 220 kgN ha? in 2012 (Extended Data Table 1) 
for rice, wheat and maize, respectively. Scenario 2 (S2): both grain yield and nitrogen application with 
80% of ISSM and land use the same as 2012. Scenario 3 (S3): both grain yield and nitrogen application 
with 80% of ISSM, and crop production just enough to reach projected demand for rice, wheat and 
maize in 2030 (requiring less cropland). 
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A Hox regulatory network of hindbrain 
segmentation is conserved to the base of vertebrates 


Hugo J. Parker!, Marianne E. Bronner? & Robb Krumlauf!? 


A defining feature governing head patterning of jawed vertebrates 
is a highly conserved gene regulatory network that integrates hind- 
brain segmentation with segmentally restricted domains of Hox gene 
expression. Although non-vertebrate chordates display nested domains 
of axial Hox expression, they lack hindbrain segmentation. The sea 
lamprey, a jawless fish, can provide unique insights into vertebrate 
origins owing to its phylogenetic position at the base of the vertebrate 
tree’’. It has been suggested that lamprey may represent an inter- 
mediate state where nested Hox expression has not been coupled to 
the process of hindbrain segmentation**. However, little is known 
about the regulatory network underlying Hox expression in lamprey 
or its relationship to hindbrain segmentation. Here, using a novel 
tool that allows cross-species comparisons of regulatory elements 
between jawed and jawless vertebrates, we report deep conservation 
of both upstream regulators and segmental activity of enhancer ele- 
ments across these distant species. Regulatory regions from diverse 
gnathostomes drive segmental reporter expression in the lamprey 
hindbrain and require the same transcriptional inputs (for example, 
Kreisler (also known as Mafba), Krox20 (also known as Egr2a)) in both 
lamprey and zebrafish. We find that lamprey hox genes display dy- 
namic segmentally restricted domains of expression; we also isolated 
a conserved exonic hox2 enhancer from lamprey that drives segmental 
expression in rhombomeres 2 and 4. Our results show that coupling 
of Hox gene expression to segmentation of the hindbrain is an ancient 
trait with origin at the base of vertebrates that probably led to the for- 
mation of rhombomeric compartments with an underlying Hox code. 

The hindbrain of jawed vertebrates is a specialized region of the nerv- 
ous system characterized by its subdivision into repetitive segments called 
rhombomeres’. Anterior Hox genes are expressed in a nested pattern 
that is functionally coupled to this inherent segmentation program® "°. 
Non-vertebrate chordates possess patterned hox gene expression along 
the body axis'’""*, which may be regulated by conserved patterning sig- 
nals in chordate evolution”, but lack nervous system segmentation. More- 
over, key segmental regulatory elements from jawed vertebrate Hox clusters 
are not conserved in amphioxus or ascidians'*’. In jawed vertebrates 
(gnathostomes), a well-characterized, highly conserved gene regulatory 
network (GRN) integrates hindbrain segmentation and Hox pattern- 
ing®”. The jawless (agnathan) vertebrate, lamprey, has been postulated 
to represent an intermediate state with rudimentary hindbrain seg- 
mentation, but lacking registration with motoneuron patterning or 
nested Hox expression* °. However, little is known about gene regula- 
tory events underlying Hox expression or coupling to hindbrain seg- 
mentation in lamprey. Here we address the nature of the agnathan 
hindbrain GRN and the degree to which it has been evolutionarily con- 
served with that of gnathostomes. 

To explore upstream GRN inputs regulating Hox expression, we 
first asked whether gnathostome hindbrain regulatory elements were 
functional in the sea lamprey, Petromyzon marinus, the only agnathan 
for which the genome is sequenced and experimental manipulation of 
early embryos is feasible*’’. Using transgenic methodologies”, we 
developed a novel cross-species approach to compare activity of specific 


regulatory elements between jawed and jawless vertebrates, by creating 
a new construct that allows efficient transgenesis in both lamprey and 
zebrafish embryos. We chose a series of enhancers from different jawed 
vertebrates that mediate segmentally restricted expression in their species 
of origin (Fig. 1a and Extended Data Fig. 1), focusing on elements that 
worked across multiple species and have well-characterized direct inputs 
from Krox20, Kreisler, retinoic acid and/or Hox auto/cross-regulation. 

Analysis of FO zebrafish embryos demonstrated that the majority of 
enhancers direct appropriate green fluorescent protein (GFP) reporter 
expression in segmental hindbrain domains (Fig. 1b, Extended Data 
Fig. 2 and Extended Data Table 1). The identities of segmental domains 
were determined by examining GFP expression in a zebrafish line express- 
ing mCherry in rhombomere (r)3/r5 under control of the endogenous 
krox20 locus. F1 lines were generated for many constructs and exhib- 
ited identical segmental expression patterns to those in FO (Fig. 1b and 
Extended Data Figs 2, 3), confirming that analysis in FO embryos accur- 
ately reflects enhancer-mediated regulatory activities. 

When tested for regulatory activity in lamprey, the same gnathostome 
constructs mediated segmental reporter expression, reminiscent of that 
seen in their host species and/or zebrafish. The restricted stripes of GFP 
expression reflect an ordered series of domains (Fig. 1b, Extended Data 
Fig. 2 and Extended Data Table 2), implying that these gnathostome 
enhancers are activated by upstream lamprey factors to mediate reporter 
expression in a rhombomeric fashion. Reporter expression spans mul- 
tiple developmental stages with variable onset between elements (Extended 
Data Fig. 4). The Hoxb1 enhancer is active first (developmental stage 
(st)18) in a broad domain that becomes restricted over time, followed 
by hoxb2 (st21), hoxb3 and Hoxb4 (st22). These data suggest that a similar 
underlying hindbrain GRN, with temporal colinearity reminiscent of 
gnathostomes, may be present in lamprey. 

Gnathostome rhombomeric enhancers have known cis-regulatory 
inputs: Krox20 for Epha4 (ref. 22) and Hoxb2 (refs 23, 24) and Kreisler 
and Krox20 for Hoxb3 (refs 25, 26). We asked whether their homologues 
might have similar segmental regulatory functions in agnathans. To test 
this, we generated constructs with mutated Kreisler and/or Krox20 sites 
within the zebrafish hoxb3 r5 enhancer (Fig. 1c, d). Mutation of the two 
Kreisler sites (mut kr1+kr2) completely eliminates reporter expression 
in both zebrafish and lamprey, whereas mutation of the Krox20 sites 
(mut kroxA+kroxB) modulates levels/efficiency of expression in both 
species (Extended Data Tables 1 and 2). These results are consistent with 
roles for Kreisler and Krox20 in the mouse Hoxb3 r5 enhancer”®, imply- 
ing homologous roles in the lamprey hindbrain. 

These data suggest that major components of the hindbrain GRN up- 
stream of Hox genes are conserved in lamprey. Therefore, we character- 
ized hindbrain expression patterns of lamprey kreisler and krox20 across 
multiple developmental stages (st19-26) (Fig. 2). krox20 is expressed 
in two stripes in a manner reminiscent of its gnathostome counterpart* 
(Fig. 2). We isolated a kreisler homologous gene that is expressed in a single 
stripe in the lamprey hindbrain (Fig. 2), similar to mouse Kreisler. The 
restricted expression of these key upstream regulators in lamprey sup- 
ports our interpretation of their inputs to reporter activities. 


1Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA. @Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, USA. 
3Department of Anatomy and Cell Biology, Kansas University Medical Center, Kansas City, Kansas 66160, USA. 
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Figure 1 | Conserved segmental activity of jawed vertebrate enhancers in 
zebrafish and lamprey. a, Schematic depicting components of the GRN for 
segmental Hox expression in the gnathostome hindbrain. The rhombomeric 
expression of upstream segmental regulators (blue) and the activity domains of 
known enhancer elements they control (green) are shown. RA, retinoic acid. 
b, GFP reporter expression in dorsal views of zebrafish and lamprey hindbrains 
mediated by enhancers from a. For zebrafish, two images of the same embryo 
are shown, GFP plus brightfield (top) and GFP plus endogenous r3r5-mCherry 
(middle) signals. The otic vesicle is circled and GFP* rhombomeres are 
indicated. Letters in parentheses indicate the species of origin of the element: 
m, mouse; zf, zebrafish. Enh, enhancer; hpf, hours post-fertilization; nc, neural 
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Figure 2 | Expression of segmental regulators and hox genes in the lamprey 
hindbrain. Gene expression visualized by in situ hybridization in lamprey 
embryos at st19-26. Dorsal views are shown, with anterior to the top. 


Arrowheads indicate the onset of segmental-like gene expression in the 
developing hindbrain. a, anterior; |, left; p, posterior; r, right. 
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crest; Reg, regulator. c, The zebrafish hoxb3 r5 enhancer contains conserved 
Kreisler (kr; blue) and Krox20 (krox; purple) binding sites (asterisks). 
Mutations known to influence activity are detailed below the aligned sites”*. 
d, GFP reporter expression of wild-type and mutated (mut) versions of the r5 
enhancer in zebrafish (dorsal views) and lamprey (lateral views) embryos. 
Numbers (1) denote the proportion of embryos exhibiting segmental reporter 
expression. Extended Data Tables 1 and 2 provide the number of embryos and 
efficiency of specific expression for all constructs. Arrowheads indicate 
segment-like reporter expression in the lamprey hindbrain. a, anterior; 

d, dorsal; p, posterior; v, ventral. 


We next examined whether lamprey hox genes themselves display 
evidence of segmental expression. We previously identified two Hox clus- 
ters, Pm] and Pm2, as well as several unassigned hox genes in P. marinus’. 
These probably represent a subset of the total hox gene complement; 
recent evidence from Lethenteron japonicum suggests up to six Hox 
clusters’, two of which are homologous to Pm1 and Pm2. Lamprey hox 
genes from paralogous groups 1-3, hox1 (Pm2), hox2 and hox3 (Pm1), 
display temporally dynamic hindbrain expression patterns. Early devel- 
opmental stages (st21-23) reveal prominent stripes of restricted expres- 
sion in the hindbrain for all three genes, apparently reflecting off-set 
segmental domains (Fig. 2) temporally correlating with robust stripes 
of both krox20 and kreisler expression. Later (st24-26), hox1 and kreisler 
are progressively downregulated in the hindbrain, while segmental ex- 
pression for hox2 and hox3 become masked by their upregulation in 
other regions (Fig. 2). krox20 expression initiates at st20 and remains 
on throughout this developmental time course. Although previous ana- 
lysis of hox gene expression, focusing on st26 in the Japanese lamprey, 
found no evidence for segmental expression*”, the potential links between 
Hox expression and hindbrain segmentation were presumably missed 
owing to the dynamic and early nature of segmental expression of these 
genes. 

To identify endogenous lamprey cis-regulatory regions that mediate 
these striking segmental hox gene expression domains, we focused on 
the hox2 paralogous group, well-characterized from a regulatory per- 
spective in jawed vertebrates*”. We sequenced the hox2 locus and entire 
intergenic region between hox2 and hox3 of Pm1, as this genomic region 
in gnathostomes contains a series of enhancers that mediate hindbrain 
Hox expression (Fig. 3a and Extended Data Fig. 1). Because no overt 
sequence conservation with known jawed vertebrate enhancers was 
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Figure 3 | Identification of enhancers from the lamprey hox2 locus. a, The 
hoxa2-hoxa3 genomic region from gnathostomes and the equivalent region 
from the lamprey Pm1 Hox cluster. hox gene exons (blue arrows) and relative 
positions of previously characterized enhancer elements in gnathostomes 
(green ovals) are shown’. hox2 enhancers identified in this study are denoted 
as grey ovals. Fragments of Pm1 tested in lamprey reporter assays are shown 
below. b, Lateral views of st26 lamprey embryos comparing the endogenous 
expression of Pm1 hox2 with GFP reporter expression mediated by fragments 
of Pm1. Pharyngeal arches are numbered. nt, neural tube; s, somites; 

ph, pharynx. c, Dorsal views of st24 lamprey embryos showing endogenous 
expression of Pm1 hox2 compared with GFP reporter expression. The exon 1-2 
region mediates two stripes of segmental expression (Extended Data Table 2 
provides information on number of embryos and efficiency of specific 
expression for the exon 1-2 region). Arrowheads indicate the anterior extent of 
expression in the neural tube. 


detectable, we functionally tested sequences from — 12 kb upstream to 
+1kb downstream of the lamprey Hox2 coding domain in lamprey 
embryos (Fig. 3a—c). At st26, the — 12 kb intergenic region mediates GFP 
expression in the neural tube, pharynx (neural crest) and somites that 
closely resembles that of the endogenous lamprey hox2 gene (Fig. 3b). 
Deletion analyses demonstrate that cis-elements capable of mediating 
neural expression lie within the —9 kb to —4 kb intergenic region whereas 
d Lamprey gene expression 
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those contributing to neural crest/somite expression lie in the —4kb 
fragment (Fig. 3a). 

Given that gnathostome Hoxa2 is expressed in r2 and r4 via exonic 
and intronic regulatory elements”*” (Fig. 3a and Extended Data Fig. 1), 
we tested a comparable fragment of lamprey hox2 (exon 1-2). Intriguingly, 
this fragment mediated restricted expression in two alternating stripes 
in the hindbrain from st22 to st26 (Fig. 3c and Extended Data Fig. 4). At 
st24, endogenous hox2 neural expression displays regions of varying 
intensity, apparently correlating with these stripes of GFP (Fig. 3c). The 
anterior boundary of GFP expression in the hindbrain mediated by both 
the —12 kb fragment and the exon 1-2 region appear to match that of 
the endogenous hox2 gene (Fig. 3c). Hence, hox2, as in jawed vertebrates, 
contains multiple enhancers with partially overlapping/shadow activ- 
ities. The equivalent positions of rhombomeric enhancer(s) of hox2 and 
Hoxa2 genes suggests that lamprey hox genes may be coupled to hind- 
brain segmentation in part through conserved cis-elements. 

The lack of apparent morphological hindbrain segmentation in lam- 
prey makes it difficult to assign these gene expression patterns to specific 
features. To register these expression patterns, we performed multispec- 
tral analysis using co-injection of two fluorescent reporters. The hoxb3 
enhancer was used to direct red fluorescent protein (RFP) in putative r5, 
allowing registration with other enhancer-mediated GFP expression 
(Fig. 4a, b). hox2 exon 1-2 mediates expression in r2 and r4; Epha4 in 
13; hoxb2 in r4; and Hoxb4 with an anterior border of expression within 
17 (Fig. 4a, b). These segmental domains generally correlate with the activ- 
ity of these cis-elements in gnathostomes, although the Epha4 enhan- 
cer mediates expression only in r3 in lamprey as compared with r3/r5 
in zebrafish (Fig. 1b). The hoxb2 enhancer drives robust r3/r5 expression 
and weaker r4 expression in zebrafish (Fig. 1b), whereas the strongest 
expression in lamprey is in r4 and there is weak expression in r6. Some 
embryos exhibit weaker r3/r5 expression, suggesting that the Krox20 
sites in this enhancer are only moderately functional in lamprey. These 
data confirm that regulatory elements from both jawed and jawless 
vertebrates can mediate adjacent rhombomere-like segmental express- 
ion domains in the lamprey hindbrain. 

To compare endogenous with enhancer-driven domains of express- 
ion, we performed two-colour double in situ hybridization (Fig. 4c). 
Using krox20 as a reference for r3/r5, we mapped the site of hox1 express- 
ion to r4. Similarly, by comparison with krox20 and/or hox1, we mapped 


Figure 4 | Comparison of enhancer activity and 
segmental gene expression in lamprey supports 
an origin of the hindbrain GRN at the base of 
vertebrates. a, b, The register of segmental 
domains of GFP expression mediated by lamprey 
and gnathostome enhancers in st24 lamprey 
embryos (a) are mapped to putative rhombomeres 
(2-7) by direct comparison with a co-injected r5 
enhancer from zebrafish hoxb3 linked to RFP (b). 
For hoxb2 a weaker r6 stripe begins to appear at 
st23. c, Double in situ hybridization reveals that 
endogenous hox gene expression and GFP reporter 
expression align with segmental regulators in the 
lamprey hindbrain. Dorsal (top) and lateral 
(bottom) views of st23-24 embryos are shown with 
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©2014 Macmillan Publishers Limited. All rights reserved 


kreisler expression domains to 15, hox2 to r2-5 with elevated stripes in 
r3/r5, and the anterior stripe of hox3 expression to r5. An antisense 
GFP probe positions expression directed by the hoxb3 enhancer to r5. 
This analysis demonstrates that lamprey hox genes are expressed in a 
nested pattern that corresponds to the same segmental territories as 
their gnathostome counterparts. 

By taking advantage of the unique evolutionary position of lamprey 
at the base of vertebrates, we have resolved a fundamental question in 
vertebrate evolution concerning the origin of segmental Hox pattern- 
ing in the hindbrain. Our results reveal an amazing degree of conser- 
vation in both transcriptional inputs (Krox20, Kreisler) and regulatory 
element activity between jawed and jawless vertebrates (Fig. 4d). Lamprey 
hox genes display transient offset segmental expression domains, imply- 
ing that the lamprey hindbrain, as in gnathostomes, is composed of 
identifiable rhombomeric segments with an underlying Hox code. Thus, 
we conclude that the coupling of Hox gene expression to segmentation 
of the hindbrain via Krox20 and Kreisler is an ancient vertebrate trait 
that evolved before the agnathan/gnathostome split (Fig. 4e). 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Enhancer elements. Enhancer elements were selected from the published data or 
identified based upon cross-species sequence alignments. DNA containing each 
element was amplified by PCR from genomic DNA templates using Phusion High- 
Fidelity DNA Polymerase (NEB). The primers listed below were used for amplifica- 
tion and the size of each amplified fragment is indicated (in bp). The sequences in 
bold represent homology to genomic DNA and adaptor sequences are in non-bolded 
text. Mouse Hoxb1 (ref. 31) (378 bp), F:5'-AATTTGGGGCCCTCTAATAATCC 
AAGAACCTATTGAAGG-3’; R: 5’-TACAACCTCGAGCAGTATGTCACAG 
AGCTGAAG-3’. Mouse Hoxa2 (ref. 32) (808 bp) F: 5’-GATGCTGGGCCCAGA 
TCTGAATGCTGGAGCAGTCTCAG-3’; R: 5’-CATAGCCTCGAGGTACCT 
TCTCTCCCTCAAACC-3’. Zebrafish hoxa2b (2,960 bp) F: 5’-GGGTATTAAAC 
AGGTATCIGAATGC-3’; R: 5'-AAATTCGCCGCTCTCAAAT-3’. Fugu rubripes 
Hoxa2a” (1,404bp) F: 5'-ATCTGAGGGCCCTGGCTTAATGCAAACGCTA 
TATTT-3'; R: 5'-GTACATCTCGAGCCCTATTTCGAATACGACTCTG-3’. 
Fugu rubripes Hoxa2b”’ (1,263 bp) F: 5’-TGCTGTAATGCCAAAACCTC-3’; 
R: 5'-CCTGCCTCGCCTTCGTGCCG-3'. Mouse Hoxb2 (ref. 33) (2,021 bp) F: 5’- 
ATGCGTGGGCCCGGATCCCCACTTTAACACCCAAG-3’; R: 5’-GTACAG 
CTCGAGTCTCCGCCAATCGCTAGT-3’. Zebrafish hoxb2a (1,488 bp) F: 5’-T 
GACCCCATTCCGTAGTACC-3’; R: 5'-TATTTTGCGCTCCTGCTATG-3’. 
Mouse Epha4 (ref. 22) (496 bp) F: 5’-AACTGAGGGCCCAGCATGGAGCTCTC 
TTAGCGTA-3'; R: 5’-TCATTACTCGAGTTTCGGGCTCTAGATCTGC-3’. 
Mouse Hoxb3 (ref. 25) (649 bp) F: 5’-AGCTCTCTCGAGCAGTAGGATCCCAG 
GT-3’; R: 5'-GCTAATCTCGAGGAGGCCTGTAGGAGGAAG-3’. Zebrafish 
hoxb3a (928 bp) F: 5'-AATGGAGGGCCCGTGTCCGGAAGTGTCGTTTC-3’; 
R:5'-AGGGAACTCGAGCTCCAGTGAGTCCTGGTC-3’. Mouse Hoxb4 (ref. 34) 
951 bp) F: 5'-AACTGAGGGCCCTGGAATTGGTTGGGTTTTCT-3’; R: 5'-TA 
TCTCCTCGAGTGTCCATGGTGGAAAGC-3’. Mouse Hoxd4 (ref. 35) (582 bp) 
F: 5’-ACAAGTGGGCCCTGGAGGAAGGGCTAGCTTAAA-3’; R: 5’-AAAAA 
GCTCGAGAAGGGTAGTTAAAGTCCAAAAGG-3’. Lamprey hox2 exon 1-2 
2,625 bp) F: 5'-CGATGAGTCGACAGTTTGAGCGGGAAACTGG-3’; R: 5’- 
CTAATCGTCGACCGAAATCTATTGCGCCTACA-3’. 

Generation of reporter constructs. The Hugo’s lamprey construct (HLC) vector 
and its variants (HLC-GW, HLC-RFP) were created for this study (reagents and 
sequences are available on request). PCR-purified enhancer elements were cloned 
into HLC using either standard restriction-enzyme-mediated methods or by first 
cloning PCR products into the pCR8/GW/TOPO TA vector (Invitrogen) followed 
by transfer into a Gateway-compatible variant of HLC (HLC-GW) via in vitro 
recombination using the Gateway LR-Clonase II enzyme (Invitrogen). 

The 12 kb intergenic region between lamprey hox2 and hox3 of the Pm] cluster 
was cloned into HLC by homologous capture from lamprey bacterial artificial 
chromosome (BAC) 218A09 (L6)’ following previously described recombineering 
methods” and using the following homology arm sequences (homology arms indi- 
cated in bold). Arm 1, 5'-GGGCCCGTACACGGACCTGTCGTCTCATCACC 
ACCCGACTCAGGAAGTACTAGT-3’; arm 2, 5’-ACACCCCCCCCCCTCCT 
CGCTCAGTGCTCCGTCAAGGCAGCCATGG-3’. 

Shorter fragments of this intergenic region were subsequently generated from the 
captured 12 kb sequence by standard restriction-enzyme-mediated cloning approaches. 

Site-directed mutagenesis was performed on the zebrafish hoxb3 HLC construct 
using the QuikChange II XL Site-Directed Mutagenesis Kit (Stratagene) and the 
primers listed below. The bold text indicates mutated sequences that differ from 
wild type. krimutF: 5'-GITGTTTTCTGCATTTCGTTGCCTCCTTGCACGTG 
TTAGTTAATTAGTG-3’; krlmutR: 5’-CACTAATTAACTAACACGTGCAAG 
GAGGCAACGAAATGCAGAAAACAC-3’; kr2mutF: 5'-CAATGCCGTTTAG 
TAAAAAGTCAAGGACACCTACATTTTTGCCTIG-3’; kr2mutR: 5'-CAAGGC 
AAAAATGTAGGTGTCCTTGACTTTTTACTAAACGGCATTG-3'; kroxAmutF: 
5'-GCCTTCCTCCCAGCCCGTTGGTGATGC-3'; kroxAmutR: 5’-GCATCAC 
CAACGGGCTGGGAGGAAGGC-3'; kroxBmutF: 5'-GTTGCAGACACCGAC 
ATTTTTGCCTTGTGC-3’; kroxBmutR: 5'-GCACAAGGCAAAAATGTCGGT 
GTCTGCAAC-3’. 

Zebrafish and lamprey experiments. This study was conducted in accordance 
with the recommendations in the Guide for the Care and Use of Laboratory 
Animals of the National Institutes of Health and protocols were approved by 
the Institutional Animal Care and Use Committees of the Stowers Institute (zebra- 
fish, RK Protocol #2013-0110) and California Institute of Technology (lamprey, 
MEB Protocol #1436-11). 

Zebrafish reporter assay. The following zebrafish lines were used for embryo 
micro-injection experiments: Slusarski AB (wild type); egr2b:KalTA4BI-1xUASkCherry 
(r3r5-mCherry)”’. Transient transgenic zebrafish embryos were generated for 
each reporter construct by Tol2-mediated transgenesis in fertilized eggs as described 


previously’*. In general a minimum of 100 embryos were injected to monitor 
efficiency for each construct due to mosaicism and position effects of integration. 
GFP-expressing transient transgenic embryos were raised to adulthood and crossed 
with either wild-type or r3r5-mCherry fish to screen for germline transgene integ- 
ration’*. Embryos were screened for fluorescent reporter expression using a Leica 
M205FA microscope. Fluorescence and bright-field signals were captured with a 
Leica DFC360FX camera using LAS AF imaging software. Images were cropped and 
alterations to brightness and contrast were made using Adobe Photoshop CS5.1. 
Lamprey reporter assay. Embryos were harvested from gravid lamprey (P. marinus) 
caught in the wild and provided by Hammond Bay Biological Station. Transient 
transgenic P. marinus embryos were generated by I-Scel meganuclease-mediated 
transgenesis as described previously”. Single-celled embryos at 4-6 h post-fertilization 
were injected with the digested construct at a concentration of 20 ng pl’, maintained 
as described previously’’ and screened for reporter expression daily from ref. 39 st17 
onwards. In general a minimum of 100 embryos were injected to monitor efficiency 
for each construct due to mosaicism and position effects of integration. The zebra- 
fish hoxb3-HLC-RFP construct, containing RFP rather than GFP, was created for 
the co-injection experiments. Co-injected constructs were mixed at a concentra- 
tion of 15 ng pl’ each (resulting in a total DNA concentration of 30 ng pl") and 
digested for injection. Embryos were screened for fluorescence using a Zeiss SteREO 
Discovery V12 microscope and imaged with a Zeiss Axiocam MRm camera and 
AxioVision Rel 4.6 software. Images were cropped and altered for brightness and 
contrast using Adobe Photoshop CS5.1. 

Cloning lamprey in situ hybridization probes. Exonic probes were designed 
based on previously characterized/predicted gene sequences* and were amplified 
from P. marinus genomic DNA by PCR using Phusion High-Fidelity DNA Polymerase 
and cloned into the pCR4-TOPO vector. The size of each amplified fragment is 
indicated (in bp). For generating 5’ and 3’ untranslated region (UTR) probes, RNA 
from st18-26 P. marinus embryos was extracted using the RNAqueous Total RNA 
Isolation Kit (Ambion) and used as a template for 5’ or 3’ rapid amplification of 
cDNA ends (RACE) with the GeneRacer Kit and SuperScript III RT (Invitrogen). 
cDNA fragments were amplified by PCR using Phusion High-Fidelity DNA 
Polymerase and cloned into the pCR4-TOPO vector. The following primers were 
used for PCR. krox20 (ref. 4) (468 bp, predicted exonic fragment) F: 5'’-CCACAAG 
CCCTTCCAGTG-3’; R:5'-GGTGAGGACATCAGCGAGAG- 3". kreisler (529 bp, 
5' UTR and partial exon) F: Generacer 5’ Nested Primer; R: 5’-GAGAGGGCCG 
CTCGGAGAACTTGA-3’. Pm2 hox1 (949 bp, partial exon and 3’ UTR) F: 5’- 
CAGAACCGGCGCATGAAGCAGAAGA-3’; R: Generacer 3’ Nested Primer. 
Pm1 hox2 (471 bp, partial exon 2) F: 5’-CAAGCGGCAGACTCAGTACA-3’; R: 
5'-AGGTCCAGCGTGCTCTCTAA-3’. Pm1 hox3 (661 bp, partial exon 2) F: 5’-G 
ACGAGTTGAAATGGCCAAC-3’; R: 5'-TGAGACGACAGGTCCGTGTA-3’. 
eGFP (709 bp) F: 5'-CAAGGGCGAGGAGCTGTT-3’; R: 5’-CTTGTACAGCTC 
GTCCATGC-3’. 

Lamprey in situ hybridization. Digoxygenin- and fluorescein-labelled probes 
were generated by standard methods and used in single and double lamprey whole- 
mount in situ hybridization as described previously’. Embryos were cleared in a 
solution of 75% glycerol before being imaged using a Leica MZ APO microscope 
with a Zeiss Axiocam HRc camera and Axiovision Rel 4.8 software. Images were 
cropped and altered for brightness and contrast using Adobe Photoshop CS5.1. 


31. Pédpperl, H. et al. Segmental expression of Hoxb1 is controlled by a highly 
conserved autoregulatory loop dependent upon exd/Pbx. Cel! 81, 1031-1042 
(1995). 

32. Nonchev, S. et al. Segmental expression of Hoxa-2 in the hindbrain is directly 
regulated by Krox-20. Development 122, 543-554 (1996). 

33. Maconochie, M. K. et al. Cross-regulation in the mouse HoxB complex: the 
expression of Hoxb2 in rhombomere 4 is regulated by Hoxb1. Genes Dev. 11, 
1885-1895 (1997). 

34. Gould,A, Itasaki, N. & Krumlauf, R. Initiation of rhombomeric Hoxb4 expression 
requires induction by somites anda retinoid pathway. Neuron 21, 39-51 (1998). 

35. Zhang, F., Nagy Kovacs, E. & Featherstone, M. S. Murine Hoxd4 expression in the 
CNS requires multiple elements including a retinoic acid response element. 
Mech. Dev. 96, 79-89 (2000). 

36. Nolte, C., Jinks, T., Wang, X., Martinez Pastor, M. T. & Krumlauf, R. Shadow 
enhancers flanking the HoxB cluster direct dynamic Hox expression in early heart 
and endoderm development. Dev. Biol. 383, 158-173 (2013). 

37.  Distel, M., Wullimann, M. F. & Koster, R. W. Optimized Gal4 genetics for 
permanent gene expression mapping in zebrafish. Proc. Nat! Acad. Sci. USA 106, 
13365-13370 (2009). 

38. Fisher, S. et a/. Evaluating the biological relevance of putative enhancers using 
Tol2 transposon-mediated transgenesis in zebrafish. Nature Protocols 1, 
1297-1305 (2006). 

39. Tahara, Y. Normal stages of development in the lamprey, Lampetra reissneri 
Dybowski. Zool. Sci. 5.1, 109-118 (1988). 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Enhancer element tested Species of origin 


Krox20 RA 
% } RA . 
|_| TV = Hoxb1 Mouse 
RARE(DRS5) RARE(DR2) RARE(DR5) 
13/5 repressor Neural i 

Mouse 

Hoxa2 Zebrafish 
nc2TCTNc3 Fugu 
13/5 and NCC 
Krox20 Herel 
Pbx al Pbx/Prep 

Hoxb2 Mouse 

Zebrafish 
Kreisler Krox20 

Hoxb3 Mouse 
r6/7 Neural Zebrafish 

Hoxb4 Mouse 

R si) 
6/7 Neural 
Hoxd4 Mouse 
opens 
Neural r6/ 
EphA4 Mouse 
Extended Data Figure 1 | Gnathostome enhancer elements selected for listed above the elements, while the corresponding regulatory modules and 


reporter analysis. Schematic diagrams depicting the gnathostome enhancer _ their combined activity domains are detailed below the elements. For each 
elements assayed for activity in zebrafish and lamprey embryos in this study. element, the species from which it was cloned are listed on the right. Figure 
The endogenous genomic positions of the enhancer elements (green boxes) are adapted with permission from figure 4.2 in ref. 9. 

shown relative to the genes that they regulate. Known trans-acting factors are 
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Extended Data Figure 2 | Segmental activity of additional jawed vertebrate 
enhancers in zebrafish and lamprey. GFP reporter expression mediated by 
gnathostome enhancer elements in zebrafish and lamprey embryos. Dorsal 
views are shown, with anterior to the top. For zebrafish, two images of the same 
embryo are shown, presenting GFP plus brightfield (top) and GFP plus 
endogenous r3r5-mCherry (middle) signals. The zebrafish otic vesicle is 
circled. m, mouse; zf, zebrafish. 
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Extended Data Figure 3 | Segmental patterns of GFP reporter expression in _ transgenic GFP expression patterns mediated by these elements are shown in 
transgenic zebrafish lines. Lateral (top) and dorsal (middle) views of 30hpf Fig. 1b and Extended Data Fig. 2. When available, GFP lines were crossed 
transgenic (F1) zebrafish embryos show combined brightfield illumination with the endogenous r3r5-mCherry reporter line as a reference (bottom). 
and segmental GFP reporter expression in the hindbrain mediated by five The otic vesicle is circled. m, mouse; zf, zebrafish. 

different gnathostome enhancer elements. The corresponding transient 
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Hox2 -12kb Hox2 exon1-2 
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Extended Data Figure 4 | Developmental time course of GFP reporter 
expression mediated by lamprey and gnathostome regulatory elements in 
lamprey embryos. Developmental stages st18-26 are shown. All embryos are 
positioned such that the hindbrain is viewed dorsally, with anterior to the top, 
except for mouse Hoxb4 at st22, which is viewed laterally with anterior to 
the left. For hoxb2 a weaker r6 stripe begins to appear at st23. Black boxes 
indicate no GFP expression mediated by that element at that developmental 
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stage. In both fish and lamprey, expression driven by the gnathostome Hoxb1 
enhancers appears to be temporally dynamic, starting broad and refining 
with time, which is probably caused by autoregulation within this element. 
However, we cannot rule out the possibility that the enhancers used may be 
missing some repressor elements that are required for fine-tuning. m, mouse; 
zf, zebrafish. 
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Extended Data Table 1 | Zebrafish reporter assay statistics 


Element Expression domain # embryos # specific expression % specific expression 

Hoxb1(m) hindbrain 230 218 94.8 

Hoxa2(m) hindbrain 145 64 44.1 

Hoxa2b(zf) hindbrain 125 46 36.8 

Hoxa2a(fr) hindbrain 123 16 13.0 

Hoxa2b(fr) no specific expression N/A N/A N/A 

Gnathosiane Hoxb2(m) hindbrain 199 75 37.7 

elements Hoxb2a(zf) hindbrain 147 95 64.6 

EphA4(m) hindbrain 195 172 88.2 

Hoxb3(m) hindbrain 98 70 71.4 

Hoxb3a(zf) hindbrain 549 503 91.6 

Hoxb4(m) spinal cord 160 125 78.1 

Hoxd4(m) spinal cord 324 141 43.5 

Hoxb3a(zf) exp 1 hindbrain 194 161 83.0 

Hoxb3a(zf) exp 2 hindbrain 142 137 96.5 

Hoxb3a(zf) exp 3 hindbrain 213 205 96.2 

Hoxb3a(zf) Hoxb3a(zf) kr12 mut exp 1 hindbrain 176 0 0.0 
dissection 

Hoxb3a(zf) kr12 mut exp 2 hindbrain 220 0 0.0 

Hoxb3a(zf) kroxAB mut exp 1 hindbrain 162 14 8.6 

Hoxb3a(zf) kroxAB mut exp 2 hindbrain 219 55 25.1 


For each injected construct, the tissue-specific GFP expression domains are noted, along with the number and proportion of screened embryos exhibiting GFP expression in those domains. In each case, the 
numbers derive from individual rounds of injection, except for zebrafish hoxb3a, for which the data from three separate experiments (exp 1-3), which were performed to ensure reproducibility, were combined. 
Letters in parentheses after the element names indicate the species of origin of the element: fr, Fugu rubripes; m, mouse; zf, zebrafish. N/A, numbers on efficiency not available. 
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Extended Data Table 2 | Lamprey reporter assay statistics 


Gnathostome 
elements 


Hoxb3a(zf) 
dissection 


Lamprey 
elements 


Element 
Hoxb1(m) 
Hoxa2(m) 
Hoxa2b(zf) 
Hoxa2a(fr) 
Hoxa2b(fr) 
Hoxb2(m) 
Hoxb2a(zf) 
EphA4(m) 
Hoxb3(m) 
Hoxb3a(zf) 
Hoxb4(m) 
Hoxd4(m) 
Hoxb3a(zf) exp 1 
Hoxb3a(zf) exp 2 
Hoxb3a(zf) exp 3 
Hoxb3a(zf) kr12 mut exp 1 
Hoxb3a(zf) kr12 mut exp 2 
Hoxb3a(zf) kr12 mut exp 3 
Hoxb3a(zf) kroxAB mut 


Hox2 -12kb 


Hox2 -9kb 


Hox2 -4kb 


Hox2 exon1-2 


Stage 
22 
24 
23 
23 
24 

N/A 
23 
23 
24 
23 
24 
25 
23 
23 
23 
23 
23 
23 
23 


24 


23 


23 
23 


Expression domain 
neural tube 
neural crest 
neural crest 

hindbrain and neural crest 

pharynx 

no specific expression 
hindbrain 
hindbrain 
hindbrain 
hindbrain 

hindbrain & spinal cord 

hindbrain & spinal cord 
hindbrain 
hindbrain 
hindbrain 
hindbrain 
hindbrain 
hindbrain 
hindbrain 


neural tube, pharynx, 
somites 


neural tube, pharynx, 
somites 


pharynx, somites 


hindbrain 


# embryos 
231 
264 
261 
246 
218 
N/A 
192 
695 
324 
1440 
590 
300 
435 
557 
448 
407 
437 
446 
522 


N/A 


N/A 


N/A 
406 


# specific expression 
137 
138 
120 
57 
70 


N/A 
123 
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% specific expression 
59.3 
52.3 
46.0 
23.2 
32.1 
N/A 
58.9 
14.4 
9.9 
32.9 
28.6 
9.3 
56.8 
16.7 
29.9 
0.0 
0.5 
0.0 
9.0 


N/A 


N/A 


N/A 
30.3 


For each injected construct, the tissue-specific GFP expression domains are given, along with the number and proportion of screened embryos exhibiting GFP expression in those domains. In each case, the 
numbers derive from individual rounds of injection, except for zebrafish hoxb3a, for which the data from three separate experiments (exp 1-3), which were performed to ensure reproducibility, were combined. 
Letters in parentheses after the element names indicate the species of origin of the element: fr, Fugu rubripes; m, mouse; zf, zebrafish. N/A, numbers on efficiency not available. 
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Pre-Columbian mycobacterial genomes reveal seals 
as a source of New World human tuberculosis 


Kirsten I. Bos!*, Kelly M. Harkins”*, Alexander Herbig'**, Mireia Coscolla*>*, Nico Weber’, Ifiaki Comas®”, Stephen A. Forrest', 
Josephine M. Bryant’, Simon R. Harris®, Verena J. Schuenemann!, Tessa J. Campbell’, Kerttu Majander', Alicia K. Wilbur’, 
Ricardo A. Guichon!’, Dawnie L. Wolfe Steadman", Della Collins Cook'’, Stefan Niemann!*"“, Marcel A. Behr”, 

Martin Zumarraga’®, Ricardo Bastida!’, Daniel Huson’, Kay Nieselt®, Douglas Young'®*”, Julian Parkhill®, Jane E. Buikstra?, 


Sebastien Gagneux*°, Anne C. Stone? & Johannes Krause?! 


Modern strains of Mycobacterium tuberculosis from the Americas 
are closely related to those from Europe, supporting the assumption 
that human tuberculosis was introduced post-contact'. This notion, 
however, is incompatible with archaeological evidence of pre-contact 
tuberculosis in the New World’. Comparative genomics of modern 
isolates suggests that M. tuberculosis attained its worldwide distri- 
bution following human dispersals out of Africa during the Pleis- 
tocene epoch’, although this has yet to be confirmed with ancient 
calibration points. Here we present three 1,000-year-old mycobacte- 
rial genomes from Peruvian human skeletons, revealing that a mem- 
ber of the M. tuberculosis complex caused human disease before 
contact. The ancient strains are distinct from known human-adapted 
forms and are most closely related to those adapted to seals and sea 
lions. Two independent dating approaches suggest a most recent 
common ancestor for the M. tuberculosis complex less than 6,000 
years ago, which supports a Holocene dispersal of the disease. Our 
results implicate sea mammals as having played a role in transmit- 
ting the disease to humans across the ocean. 

Mycobacterium tuberculosis has had a long history with humans, al- 
though consensus has not been reached on when this interaction began’**. 
Previous models held that the human-adapted pathogen evolved from 
a zoonotic transfer of Mycobacterium bovis following animal domest- 
ication during the Neolithic age’. Comparative genomic analyses, how- 
ever, suggest that the bovine form and those adapted to other animal 
hosts are in fact derived from human strains*°. This supports a rather 
different disease history where humans may have been the most suscept- 
ible host species for early progenitors of strains currently circulating. 
Today the majority of M. tuberculosis diversity exists in Africa’, implying 
that the pathogen probably originated from a monoclonal expansion 
therein and achieved its worldwide distribution via human movements’**. 
The observations that M. tuberculosis strains tend to be associated with 
human populations® and that selection in the bacterium exists at loci 
associated with host immune responses” indicate that host and patho- 
gen have had sufficient time to co-evolve. Dating approaches that use 
human demographic events for calibration generate substitution rates 
that differ by over an order of magnitude depending on the model*””, 
and would thus contribute to vastly different coalescence estimates for 


all M. tuberculosis lineages, collectively referred to as the M. tuberculosis 
complex (MTBC). 

Given the pathogen’s phylogeography, current models are unable to 
explain the abundant archaeological evidence for the presence of tu- 
berculosis in the Americas before European contact. Strains currently 
circulating in the Americas are most closely related to those of Euro- 
pean origin, and this has been used to support a European dissemination 
from either early settlement or trade associations’. This model, however, 
is incompatible with bioarchaeological data indicating the presence of 
tuberculosis in the pre-contact New World? (see Supplementary Infor- 
mation). Molecular investigations using ancient pre-Columbian material 
have identified short conserved regions of mobile elements considered 
to be diagnostic for tuberculosis, although these markers offer no in- 
formation about phylogenetic placement, and are thus difficult to au- 
thenticate as ancient’’. While a Pleistocene dispersal following human 
movements out of Africa could explain its presence in the pre-contact 
New World**, the dominance of European-derived lineages in the Amer- 
icas today makes this difficult to reconcile without data to support a 
complete strain replacement within the past 500 years. 

Genomic reconstructions of ancient pathogens provide robust evid- 
ence of DNA authenticity and permit genome-level comparisons’”. The 
success of DNA capture’ and genomic assembly ofan historical MTBC 
strain via metagenomic sequencing™ implies that DNA preservation 
of this pathogen may be adequate to address outstanding evolutionary 
questions requiring use of archaeological material. Here we apply these 
techniques to demonstrate that a previously uncharacterized member 
of the MTBC caused human infection in the Americas before European 
contact. 

We screened 68 skeletal samples representing New World pre- and 
post-contact sites (Supplementary Table 1). All individuals showed skel- 
etal indicators associated with tuberculosis infections. Samples were 
processed via established protocols and were screened for M. tuber- 
culosis DNA by an in-solution capture assay designed for the rpoB, 
gyrA, gyrB, katG, and mpt40 genes (Supplementary Table 2). Capture 
products for samples and negative controls were sequenced on an Illu- 
mina MiSeq and mapped to the corresponding regions in the M. tuber- 
culosis H37Rv reference genome (NC_000962.2). No tuberculosis-specific 
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Figure 1 | Archaeological description of the skeletal samples. a, Map of 
Peru showing locations of archaeological sites; CGIAR SRTM 90m Digital 
Elevation Database version 4.1 (http://srtm.csi.cgiar.org). b, c, Skeletal lesions 
of active tuberculosis from two individuals positive for M. tuberculosis DNA 
(b, individual 58; c, individual 64). Arrows show vertebral lesions, collapse, 
fusion, and kyphosis. 


fragments were found in our negative controls. Only three of the 68 
samples, referred to here as samples 54, 58, and 64, showed convincing 
preservation of tuberculosis DNA (see Supplementary Information, Ex- 
tended Data Fig. 1, and Supplementary Table 1): all three samples were 
recovered from excavations in Peru and derive from Chiribaya cultures 
associated with the Middle Horizon/Late Intermediate period (Ap 750- 
1350) (Fig. 1). Radiocarbon dates ranging from AD 1028 to AD 1280 (at not 
less than 98.5% probability) (Supplementary Table 3) confirm that they 
predate European contact. Spectra of DNA damage displayed a pattern 
expected of ancient molecules’. For comparison, non-enriched librar- 
ies were sequenced on an Illumina MiSeq producing 34,780 to 112,428 
reads for each of the three samples, of which 4.6% to 1.6% mapped to 
the human genome (hg19). In contrast, a maximum of only 1.8% of the 
reads mapped to the M. tuberculosis reference genomes (Supplementary 
Table 1), indicating that DNA capture would be necessary for genome 
retrieval. 

DNA libraries treated with uracil DNA glycosylase were generated 
to remove and repair damaged nucleotides, and were subsequently used 
for full genome hybridization capture (Agilent). Array probes were de- 
signed to accommodate known genetic diversity in the MTBC, as well 
as portions of the Mycobacterium avium and Mycobacterium kansasii 
genomes (Supplementary Table 4). Enriched products were sequenced 
on one lane of an Illumina HiSeq 2000. For comparison against a larger 
data set of 259 modern MTBC genomes including the outgroup Myco- 
bacterium canettii’, all ancient reads were mapped against a computa- 
tionally constructed ancestor for the MTBC’. The recently published 
genome from an eighteenth century Hungarian mummy”, as well as 14 
animal strains from the Mycobacterium caprae, Mycobacterium microti, 
and Mycobacterium pinnipedii lineages, were added, along with a strain 
recently isolated from a wild chimpanzee’®. Standard mapping resulted 
in heterozygous positions for the mummy, Peruvian samples 54 and 64, 
and all modern samples (Extended Data Fig. 2). Increased mapping 
stringency removed many heterozygous positions for the Peruvian sam- 
ples, suggesting they derived from non-tuberculosis reads; however, 
the Hungarian mummy and eight modern samples still displayed het- 
erozygosity consistent with mixed strains’* (Extended Data Fig. 3). Our 
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Figure 2 | Coverage plots for three ancient genomes. Inner ring: purple, 
AT content; gold, GC content. Coverage rings for samples 64, 58, and 54 shown 
in red, green, and blue, respectively. Vertical lines indicate locations of unique 


SNPs. SNPs were identified before exclusion of positions with missing 
data from the full 262 genome data set. 
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more stringent mapping reduced overall genomic coverage for all sam- 
ples. The final data set thus consisted of 262 genomes with a minimum 
of 75% coverage, four of which were ancient (Supplementary Table 5). 
A minimum of 20-fold average coverage was obtained for each of the 
Peruvian genomes (Fig. 2 and Supplementary Table 6), implying a 40- 
to 120-fold enrichment (Supplementary Table 6). 

Single nucleotide polymorphism (SNP) analyses were performed by 
comparing all genomes against the constructed ancestor. This identified 
53,177 SNPs for the entire data set, which ranged between 489 and 1,415 
per genome (Supplementary Table 8). As input for phylogenetic assess- 
ments we used an alignment of 22,480 variable positions after removing 
all positions with missing data. Tree reconstructions revealed that our 
Peruvian genomes did not cluster with other human strains, but rather 
were more closely related to the animal lineage (Extended Data Figs 4-6), 
sharing 76 SNPs with modern M. pinnipedii strains (Fig. 3). Genomic 
architecture revealed a region of difference (RD) deletion pattern com- 
mon to all animal lineages (Supplementary Table 7), as well as absence 
of the M. microti-specific RDmic and presence of the M. pinnipedii- 
specific RDseal. To our knowledge, M. pinnipedii strains have been iso- 
lated only from seal species restricted to the Southern Hemisphere”’. 
Here they were harvested from captive and wild animals from South 
America and Australia. The three ancient strains share five unique SNPs, 
all of which are non-synonymous (Supplementary Table 9); this indi- 
cates that these strains derive from a common progenitor, with subse- 
quent accumulation of 10-23 substitutions along the three strain-specific 
branches. To investigate possible signals of adaptation, we screened these 
five shared SNPs for putative functional effects. Our computational 
analysis predicted a functional impact of the P44L mutation in Rv2258c, 
encoding a methyltransferase involved in ubiquinone metabolism (Sup- 
plementary Table 9). The SNP in the ctpA gene at codon 62 (D62N) 
was not predicted to have a functional impact; however, we identified 
two other non-synonymous SNPs (D62G and Dé2E), also not predicted 
to have functional impacts, in the same codon of ctpA at different 
positions, each in a lineage 4 modern strain. The occurrence of homo- 
plasies is uncommon in the MTBC, and therefore potentially indicates 
positive selective pressure’*. A site-wise analysis of positive selection 
on codon 62 of ctpA confirmed that all three SNPs may be under di- 
versifying selection (Supplementary Table 10). The ctp genes encode 
efflux ion pumps that are thought to prevent metal accumulation in 
the bacterium”, hence adaptation may relate to host metal-ion avail- 
ability. This notion is supported by the existence of homoplasies in other 
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Figure 3 | Phylogenetic analysis. a, Bayesian maximum clade credibility tree 
of 261 MTBC genomes (excluding Hungarian mummy), with estimated 
divergence dates shown in years before present using a model of population 


genes of the efflux pump family in modern MTBC strains (Supplemen- 
tary Table 9). 

Bayesian dating analysis used radiocarbon dates as tip calibration (Sup- 
plementary Table 3). The Hungarian mummy sample was excluded 
because of the presence of multiple strains. A clock test rejected the mo- 
lecular clock for all 258 modern genomes (P = 5 X107 4”) (Extended 
Data Fig. 7). Dating analysis using a relaxed clock model and a constant 
population size generated a mutation rate of 4.6 X 10 ® substitutions 
per site per year (3 X 10 *to 6.2 X 10 °95% highest posterior density 
(HPD) interval). Bayesian skyline plots revealed constant population 
sizes for the animal strains and clear indications of expansions in the 
human-adapted lineages (Extended Data Fig. 8b). An expansion model 
had a negligible influence on the mutation rate, generating 4.9 X 10 * 
substitutions per site per year (3.4 X 10 * to 6.4 X 10° 95% HPD), 
which corresponds to 0.20 and 0.21 substitutions per genome per year 
for a constant and expanding population model, respectively. This rate 
agrees well with estimates of MTBC evolution in modern epidemiolo- 
gical contexts”’, and is more than tenfold faster than those using human 
dispersals out of Africa as calibration’. Our mutation rates date the most 
recent common ancestor (MRCA) for the MTBC (excluding M. canet- 
tii) at 4,449 years before present (yr BP) (2,990-6,062 yr BP 95% HPD) 
and 4,064 yr BP (2,951-5,339 yr BP 95% HPD) for constant size and 
expansion models, respectively (Extended Data Fig. 8a). This dating 
was corroborated by an independent analysis using the sequences from 
the Hungarian mummy” sample as the only ancient calibration point. 
We separated the individual variants of the two mummy strains by re- 
constructing them onto the MTBC lineage 4 phylogeny (Extended Data 
Fig. 9). Lengths from the terminal branches were estimated by using the 
number of heterozygous variants not present in the modern strains, 
under the assumption that both isolates were equidistant from the root 
of the tree. The year of death 1797 and estimated ages for the penulti- 
mate nodes were used as priors for Bayesian phylogenetic reconstruc- 
tion. Using only synonymous variants, a relaxed clock, and constant 
population size, we estimate the age of the MRCA of the MTBC (ex- 
cluding M. canettii) to be 5268.5 yr BP (2689.6-8417.7 95% HPD) with 
a synonymous substitution rate of 7.07 X 107° (3.70 X 10° * to 1.12 X 
10 795% HPD) per site per year (Extended Data Fig. 10 and Supplemen- 
tary Table 11). This higher substitution rate may be due to lower selec- 
tive pressure on synonymous sites. 

Our results provide unequivocal evidence of human infection caused 
by members of the MTBC in pre-Columbian South America. Our MRCA, 
which is at least an order of magnitude younger than previous estimates*, 
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presented us with a challenge to explain how a mammalian pathogen 
could have reached human populations in the Americas about 10,000 
years after inundation of the Bering land bridge”’. The fact that our an- 
cient genomes share a common ancestor with strains that are restricted 
to seals and sea lions’” provides a plausible, if unexpected, route of entry 
into the New World: within the past 2,500 years pinnipeds probably 
contracted the disease from an African host species, carried the disease 
across ocean waters, and exploitation of marine mammals among 
coastal peoples of South America facilitated a zoonotic transfer of the 
bacterium within the first millennium aD. This parallels similar zoo- 
noses of marine parasites acquired from seal consumption among ar- 
chaeological coastal populations” (Supplementary Information). 

Owing to the abundance of publications reporting morphological evi- 
dence of pre-Columbian tuberculosis in the region, the coasts of Peru 
and northern Chile have long been recognized in the archaeological 
literature as locations where tuberculosis first came into view in the 
New World’. Some have even suggested marine mammals as a poten- 
tial source of the infection”’. The three individuals considered here show 
pathological changes consistent with either pulmonary or disseminated 
tuberculosis, so a non-contagious infection acquired from consump- 
tion of contaminated animal products in each case cannot be ruled out. 
In the absence of these data, however, the five unique derived positions 
shared by the ancient Peruvian genomes may provide preliminary evi- 
dence of host specificity. All three genomes share a common ancestor 
that predates the radiocarbon age of our skeletal material by more than 
100 years, and two SNPs show potential signals of adaptation. These 
observations could support a single zoonotic transfer from pinnipeds 
to humans between AD 700 and AD 1000 (Fig. 3). Subsequent host adap- 
tation and dissemination is a compelling prospect for future work. If 
confirmed, this would constitute the first example of a zoonotic trans- 
fer followed by re-adaptation to the human host in the MTBC. 

Such a model could explain the abundance of tuberculosis-like le- 
sions in the region that accumulate beginning at approximately aD 700 
(refs 24, 25). The later appearance of similar skeletal lesions in North 
America that first appear at about AD 900 is consistent with either a trans- 
continental spread of the pathogen via established trade routes” or a 
later independent introduction of tuberculosis from a different source. 
The lack of representation of this or any other American-specific strain 
in modern groups supports replacement by a European strain after con- 
tact that quickly moved through indigenous populations on account of 
additional adverse factors such as social marginalization, food insec- 
urity, and potentially facilitative co-circulating infections that reached 
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epidemic levels, such as those recorded in northern North America dur- 
ing the decline of the fur trade’’. Our data also indicate a subsequent 
introduction of M. pinnipedii to Australian seal colonies within the past 
700 years (Fig. 3); the potential for similar zoonotic transfers, therefore, 
exists in Oceanian populations, although lesions suggestive of tuber- 
culosis have not been identified in relevant skeletal material’. 

M. pinnipedii has caused infection in several mammalian host spe- 
cies, including humans, in the context of zoo outbreaks**. Further sam- 
pling of animal-adapted MTBC from both modern and ancient contexts 
will be of great value in determining its range of potential host species 
and in clarifying directions of transmission. While a human transfer of 
the bacterium to marine mammals cannot be ruled out from our data, 
we consider this extremely unlikely: humans did not herd or farm seals, 
and close, regular contacts would be required for anthroponotic trans- 
mission, as is observed in domestic cattle”’. 

The above assertion of an introduction of MTBC via pinnipeds fol- 
lowed by human adaptation and subsequent transmission throughout 
the Americas can only be confirmed by comparison with additional 
North and South American pre-Columbian MTBC genomes from non- 
coastal groups, which remain elusive despite the inclusion of suitable 
material in our screening (Supplementary Table 1). In addition, our 
dating analyses are based on two independent approaches, although 
each relies on (effectively) a single calibration point. Mutation rate het- 
erogeneity is documented in other clonal pathogens”’, and the rejection 
of our molecular clock indicates that MTBC evolution is not constant 
among lineages. Additional calibration points from ancient MTBC lin- 
eages around the world will be essential to evaluate the legitimacy of 
our proposed models. Such caveats are of paramount importance con- 
sidering the many investigations that report on members of the MTBC 
identified in skeletal samples that predate our inferred MRCA, or Amer- 
ican material from periods that predate our proposed time of MTBC 
entry. Such claims could only be reconciled with what we propose here 
if (1) rate heterogeneity or horizontal gene transfer is obscuring our 
dating analysis, perhaps as a result of human population expansions 
which increase the availability of susceptible hosts and allow selection 
to operate more quickly, (2) the pathogens identified in the earlier ar- 
chaeological material are in fact not members of the MTBC, but rather 
are ancestral forms that have since undergone replacements, or (3) cer- 
tain techniques for MTBC identification in archaeological material lack 
specificity. 
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Extended Data Figure 1 | Coverage and damage plots for the M. tuberculosis capture regions for samples 54, 58, and 64. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


8- Sample 54 S- Sample 64 
ei a 
° | 91 
0 foe} 
° | oO 
wo oO 
° | oO 
t t 
° | Oo] 
N N 
. (limi  .. Milool.tatecll 
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 4 
S- Sample 58 8- Hungarian Mummy 
ca a 
o | So 
oe) ice} 
° | So | 
oO o _ 
o | oO] 
t t 
° | oO | 
N N 
od _~tHinn OTe cfm 7 0 oe erili =a] 
I T T rf T 1 i T T T T 1 
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 


Extended Data Figure 2 | Histograms of SNP allele frequency distributions 
for the ancient samples and the Hungarian mummy sample using standard 
mapping parameters. The x axis denotes the frequency of reads covering a 


SNP position in which the SNP was detected. The y axis denotes the number of 
observed SNP calls with the respective frequency. All variants with a SNP allele 
frequency below 90% are shown. 
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Extended Data Figure 3 | Histograms of SNP allele frequency distributions detected. The y axis denotes the number of observed SNP calls with the 

for the ancient samples, the Hungarian mummy sample, and two modern _ respective frequency. All variants with a SNP allele frequency below 90% are 
isolates using stricter mapping and filtering parameters. The x axis shown. 
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Extended Data Figure 4 | Maximum parsimony analysis. a, Maximum excluded. Branches are labelled with the absolute number of substitutions. 
parsimony tree of all 262 samples of the complete data set. Positions with Internal nodes are labelled with bootstrap statistics obtained from 1,000 
missing data were excluded. b, Subtree of the full maximum parsimony tree replicates. 
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Extended Data Figure 5 | Maximum likelihood analysis. a, Maximum lineage 6 and animal strains. Positions with missing data were excluded. 
likelihood tree of all 262 samples of the complete data set. Positions with Internal nodes are labelled with bootstrap statistics obtained from 200 
missing data were excluded. b, Maximum likelihood subtree showing the replicates. 
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Extended Data Figure 6 | Neighbour joining analysis. a, Neighbour joining 
tree of all 262 samples of the complete data set. Positions with missing data were 


LETTER 


Pinnipedii7011 
Pinnipedii7739 
Pinnipedii_G01222 
Pinnipedii_G01491 
Pinnipedii_G01492 
Pinnipedii_G01498 
4U 

S4U 

58U 
MicrotiERR027294 
Morygys 
bovis_ravenel 
L6_ABCG 
capraeRW044 
capraeD028 
capraeRWO79 


chimpanzee_bacilus 


%, 
‘a, 


0068 


excluded. b, Neighbour joining subtree showing the lineage 6 and animal 


©2014 Macmillan Publishers Limited. All rights reserved 


0.003, 


L6_No090 
L6_538302 
L6_823602 
L6_533604 
L6_Noo99 
L6_5468_02 
L6_541504 
L6_414104 
L6_Noo89 
L6_Noo98 
L6_NO092b4 
L6_NO115 
L6_NOO60 
L6_Noos1 
Le_GMO981 


strains. Positions with missing data were excluded. Internal nodes are labelled 
with bootstrap statistics obtained from 1,000 replicates. 


LETTER 


rate (aeeae Extended Data Figure 7 | Maximum clade credibility tree of 

9.3E-08 M. tuberculosis. The tree was estimated using the uncorrelated log-normal 
relaxed clock model in BEAST 1.7.5 (ref. 31). The radiocarbon dates of the 
ancient Peruvian strains were used as temporal estimates to date the tree. 
Branch lengths are scaled to years. Branch colours indicate the estimated 
branch substitution rate on the logarithmic scale shown in the legend at the left. 
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Dendritic cells control fibroblastic reticular network 
tension and lymph node expansion 


Sophie E. Acton!?, Aaron J. Farrugia’, Jillian L. Astarita‘*, Diego Mouriio-Sa't, Robert P. Jenkins*, Emma Nye?, Steven Hooper’, 
Janneke van Blijswijk', Neil C. Rogers', Kathryn J. Snelgrove', Ian Rosewell®, Luis F. Moita”’®, Gordon Stamp”, Shannon J. Turley’, 


Erik Sahai? & Caetano Reis e Sousa! 


After immunogenic challenge, infiltrating and dividing lymphocytes 
markedly increase lymph node cellularity, leading to organ expansion’”. 
Here we report that the physical elasticity of lymph nodes is main- 
tained in part by podoplanin (PDPN) signalling in stromal fibro- 
blastic reticular cells (FRCs) and its modulation by CLEC-2 expressed 
on dendritic cells. We show in mouse cells that PDPN induces acto- 
myosin contractility in FRCs via activation of RhoA/C and down- 
stream Rho-associated protein kinase (ROCK). Engagement by CLEC-2 
causes PDPN clustering and rapidly uncouples PDPN from RhoA/C 
activation, relaxing the actomyosin cytoskeleton and permitting FRC 
stretching. Notably, administration of CLEC-2 protein to immunized 
mice augments lymph node expansion. In contrast, lymph node expan- 
sion is significantly constrained in mice selectively lacking CLEC-2 
expression in dendritic cells. Thus, the same dendritic cells that ini- 
tiate immunity by presenting antigens to T lymphocytes’ also initiate 
remodelling of lymph nodes by delivering CLEC-2 to FRCs. CLEC-2 
modulation of PDPN signalling permits FRC network stretching 
and allows for the rapid lymph node expansion—driven by lympho- 
cyte influx and proliferation—that is the critical hallmark of adap- 
tive immunity. 

Lymph nodes are meeting places for T lymphocytes and antigen- 
presenting dendritic cells’’. T-cell-dendritic-cell interactions are sup- 
ported by FRCs*°, a complex interconnected network that produces and 
ensheathes extracellular matrix components’ that filter draining lymph’. 
FRC networks additionally provide physical routes for leukocyte traffic’, 
and chemoattractants for T cells and dendritic cells*. Furthermore, con- 
tact with FRCs promotes chemokinesis in dendritic cells, facilitating their 
migration within lymph nodes’. This is partly due to cytoskeletal changes 
in dendritic cells induced upon signalling by the C-type lectin receptor 
CLEC-2 when it is engaged by PDPN expressed on FRCs*. We asked 
whether, in addition to working as a PDPN receptor and promoting 
dendritic cell movement along FRCs, CLEC-2 might also act as a ligand, 
modulating PDPN function and altering the properties of the FRC 
network. 

To examine PDPN signalling in fibroblasts, wild-type PDPN tagged 
with cyan fluorescent protein (CFP) was overexpressed in NIH/3T3 cells, 
which express only low levels of the endogenous protein*. Within 30h 
of transfection, CFP was detectable at the plasma membrane, where 
it co-localized with mCherry-tagged ezrin, consistent with reports of a 
direct interaction between the two proteins” (Fig. 1a and Supplemen- 
tary Video 1). Ezrin belongs to a family of closely related proteins, ezrin, 
radixin and moesin (ERM), which tether the actin cytoskeleton to the 
plasma membrane. We therefore examined the localization and phos- 
phorylation of ERM proteins, along with myosin light chain (MLC), which 
mediates actin-dependent contraction, in PDPN-CFP-overexpressing 


cells. In contrast to untransfected cells, PDPN-CFP* NIH/3T3 cells 
displayed phosphorylated (p)ERM and pMLC accumulation at the cell 
cortex (Fig. la) and often rounded up, features typical of contractile 
cells''"'*. A non-phosphorylatable ezrin T567A mutant formally dem- 
onstrated the key role of ERM phosphorylation in PDPN-driven cell 
contraction (Fig. 1b). 

To determine which pathways connected PDPN to cell contraction, 
a chemical screen was conducted, which revealed relaxation upon inhi- 
bition of RhoA/C, ROCK or myosin II family proteins (Extended Data 
Fig. 1a, b). Strikingly, treatment with soluble recombinant CLEC-2-Fc 
protein phenocopied RhoA/C and ROCK inhibition, almost completely 
reversing the contraction induced by PDPN-CFP (Fig. 1c). The inhi- 
bition by CLEC-2 was rapid but transient (Fig. 1c) and led to ezrin re- 
distribution from the plasma membrane to the cytoplasm (Fig. 1d). To 
test this in FRCs expressing physiological levels of PDPN, we gener- 
ated lymph node FRC lines (Extended Data Fig. 2 and Methods). Sub- 
lines stably expressing fluorescence resonance energy transfer (FRET) 
biosensors reporting RhoA or Racl activity were exposed to CLEC-2- 
Fc-coated 10 jim beads. In agreement with the NIH/3T3 studies, RhoA 
activity was immediately and robustly reduced when CLEC-2 beads made 
contact with FRCs (Fig. le and Supplementary Video 2). Sudden loss of 
RhoA activity was also evident from temporary loss of adhesion’® (Sup- 
plementary Videos 2 and 3). In contrast, Racl activity gradually increased 
after exposure to CLEC-2 beads (Fig. le and Supplementary Video 3), 
which was confirmed in longer-term experiments by pulldown of GTP- 
bound Racl (Fig. 1f). Higher Racl-GTP levels increased ARP2/3* lameli- 
podial protrusions, and tail retraction defects were also observed in FRCs 
when PDPN was stably depleted (PDPN knockdown FRCs; Fig. 1fand 
Extended Data Fig. 3). To identify guanine—nucleotide exchange factors 
(GEFs) that could connect PDPN to activation of RhoA/C, we decreased 
the expression of candidates using short interfering RNA (siRNA) and 
found that PDPN-induced contractility primarily requires GEF-H1 in 
NIH/3T3 and FRCs (Extended Data Fig. 1c, d). We also looked for furth- 
er changes in FRCs consistent with decreased RhoA/C activity and found 
that PDPN knockdown or engagement by CLEC-2 caused rapid dis- 
solution of stress fibres (Fig. 1g, Supplementary Video 4 and Extended 
Data Figs 3, 4). Thus, PDPN in fibroblasts associates with ezrin and sig- 
nals to promote RhoA/C-dependent actomyosin-driven contraction. This 
is alleviated by CLEC-2 engagement, causing uncoupling of PDPN from 
ezrin anda RhoA/C to Racl switch. We hypothesize that Rac activity 
is increased indirectly as a consequence of reduced RhoA/C activity'*”*. 

The cytoplasmic tail of PDPN undergoes phosphorylation and ezrin 
recruitment requires basic residues surrounding Ser 167 (refs 10, 17). We 
therefore tested whether phosphorylation of Ser 167 controlled PDPN- 
induced contractility. Overexpression of the PDPN(S167A) mutant failed 
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Figure 1 | CLEC-2 binding uncouples PDPN from RhoA/C- and 
actomyosin-driven fibroblast contractility. a, NIH/3T3 cells expressing 
PDPN-CFP (blue) or untransfected (control), fixed and stained for pERM 
(green) or pMLC (S19) (green) and F-actin (red). Scale bars, 20 pm. 

b, Frequency of contracting NIH/3T3 cells expressing PDPN-mCherry or 
PDPN-mCherry and ezrin(T567A)-GFP. c, NIH/3T3 cells expressing 
PDPN-CFP (green) stained for F-actin (red) and treated with 10 pg ml! 
CLEC-2-Fc (15 min). Scale bars, 50 um. Quantification in the right panel 
depicts mean + standard deviation (s.d.) of three experiments (>300 cells). 
***D < (),0005, ****P < 0.00005, Fisher’s exact test. d, NIH/3T3 cells 


to cause contraction in NIH/3T3 fibroblasts and occasionally caused 
collapse of the cytoskeleton, potentially by inhibiting activity of low 
levels of endogenous PDPN (Fig. 2a). In contrast, a PDPN(S167E) phos- 
phomimetic mutant induced contractility comparable to the wild-type 
protein (Fig. 2a, b). Inhibition of ROCK blocked contraction by both 
wild-type and S167E PDPN (Fig. 2b), placing ROCK activity downstream 
of Ser 167 phosphorylation. However, CLEC-2-Fc treatment, although 
inhibiting contraction induced by wild-type PDPN, had no effect on 
S167E PDPN (Fig. 2b), suggesting that regulation of the phosphoryla- 
tion status of Ser 167 is one mechanism by which CLEC-2 uncouples 
PDPN from actomyosin contractility. 

We considered how FRCs express high levels of endogenous PDPN, 
yet, unlike transfected NIH/3T3 cells, do not display hypercontracti- 
lity. As PDPN remained at the plasma membrane when engaged with 
CLEC-2 (Fig. 1d), we hypothesized that it partitions between active and 
inactive pools, the latter being maintained by binding to an inhibitory 
partner such as CD44 (refs 18, 19). FRCs express high levels of both 
PDPN and CD44 (ref. 4) and knockdown of PDPN led to a concordant 
reduction of CD44 (Fig. 2c), suggesting an interaction. NIH/3T3 cells 
express low levels of CD44, perhaps accounting for their susceptibility 
to contraction after overexpression of PDPN. Consistent with that notion, 
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expressing PDPN-CFP and ezrin-mCherry treated with 10 pg ml’ CLEC-2-Fe 
(15 min). Single optical slice (1 1m); scale bars, 20 lm. Pixel co-localization 
analysis is shown at the bottom. e, FRC cell lines expressing RhoA or Racl FRET 
biosensors exposed to CLEC-2-Fc-coated beads. Quantification of FRET 
ratio is shown on the right and depicts mean + s.d. of 15 cells from 2 
independent experiments. f, Left: total and GTP-bound Racl in lysates from 
FRCs treated with 10 pg ml” * CLEC-2-Fc (30 min). Scale bar, 50 um. Right: 
same analysis in two independent PDPN-knockdown FRC lines (KD1 and 
KD2) versus control line. g, FRC cell lines expressing green fluorescent protein 
(GFP)-MLC (greyscale) treated with CLEC-2-Fc-coated beads. Scale bar, 50 Lm. 


co-transfection of CD44 and PDPN into NIH/3T3 fibroblasts mark- 
edly inhibited contraction (Fig. 2d). 

CD44 resides within cholesterol-rich lipid rafts’” and we tested whether 
CLEC-2 induces PDPN redistribution to such rafts. In steady-state FRCs, 
PDPN and lipid rafts were found in small, partially co-localized clusters 
(Fig. 2e), as described in epithelial cells. CLEC-2-Fc treatment induced 
formation of larger clusters in which PDPN and lipid rafts were more 
often co-localized (Fig. 2e). Notably, depletion of cholesterol from FRC 
membranes with methyl-B-cyclodextrin (MBCD) increased contrac- 
tility, which was prevented by PDPN knockdown (Fig. 2f). CLEC-2 no 
longer inhibited PDPN-induced contraction in NIH/3T3 cells pre-treated 
with MBCD (Fig. 2g). Together, these data support the notion that CLEC-2 
sequesters PDPN within lipid rafts, where increased interaction with 
CD44 prevents signalling to RhoA/C. Interestingly, CD44 can itself also 
drive contractility when excluded from lipid rafts (data not shown), 
suggesting a mutually inhibitory interaction with PDPN. 

To explore the biological significance of PDPN-CLEC-2 interactions 
for FRC function, we examined cell behaviour in three-dimensional col- 
lagen gels. Notably, FRCs reorganized the gel matrix to occupy a smaller 
volume, and this was inhibited by CLEC-2 treatment or PDPN knock- 
down (Fig. 3a). Furthermore, CLEC-2 treatment or PDPN knockdown 
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Figure 2 | CLEC-2 binding causes redistribution of PDPN within the 
plasma membrane. a, NIH/3T3 cells expressing V5-tagged PDPN mutants 
(PDPN-V5) stained for V5 (green) and F-actin (red). Scale bars, 50 jum. 

b, Contraction score of NIH/3T3 cells expressing wild-type (WT) or mutant 
PDPN treated with 10 1M ROCK inhibitor (Y27632) (6h) or 10 ug ml? 
CLEC-2-Fc (30 min). Mean = s.d. of three independent experiments (>300 
cells). ****P < 0.00005, Fisher’s exact test. c, Surface expression of PDPN and 
CD44 in the indicated cells as analysed by flow cytometry. KD, knockdown. 
d, Contraction score of NIH/3T3 cells expressing PDPN-V5 + CD44-GFP. 
Mean + s.d. of three experiments (>150 cells). *P < 0.05, Fisher’s exact test. 
e, Confocal slices (0.5 1m) showing surface staining of lipid rafts and PDPN on 


caused marked elongation of FRCs in three-dimensional culture (Fig. 3b 
and Extended Data Fig. 5).'To determine the relevance of FRC stretching 
for lymph node dynamics, we investigated FRC network changes after 
induction of inflammation in vivo, which leads to upregulation of CLEC-2 
by both lymph-node-resident and migratory dendritic cells*”°. We first 
examined the cellular composition of draining lymph nodes after sub- 
cutaneous immunization of mice with ovalbumin (OVA) in complete 
Freund’s adjuvant (CFA). During the afferent phase (days 0-6), T- and 
B-cell numbers increased rapidly and total lymph node cellularity aug- 
mented 2-3 fold (Extended Data Fig. 6). However, numbers of FRCs 
(CD45- PDPN* CD31_ ) remained constant until day 6 (Fig. 3c). This 
lag in FRC proliferation has been previously observed”', although the 
kinetics probably depend on the type and strength of the inflammatory 
stimulus. In contrast, blood endothelial cells (CD45. PDPN’ CD31 +), 
a distinct lymph node stromal population, increased in number from 
the earliest time point (Extended Data Fig. 6). 

If FRCs do not proliferate during the early stages of acute inflamma- 
tion, then to accommodate increased lymph node size the FRC network 
needs to ‘stretch’ to avoid disruption’. Consistent with that, the mean 
forward scatter of lymph node FRCs, an indication of cell size, increased 
after immunization (Fig. 3d)”' and the integrity of the FRC network was 
maintained (Fig. 3e). A gap analysis algorithm” revealed significantly 
larger spaces between the reticular network branches after immunization, 
consistent with FRC stretching (Fig. 3f). Ina complementary approach, 
we used PDGFRaKI-H2B-GFP mice (Extended Data Fig. 7) and calcu- 
lated the spacing between FRC nuclei (GFP~) using automated mor- 
phometric analysis software, which confirmed that FRCs spread further 
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primary FRCs treated with 10 pg ml” CLEC-2-Fc protein (45 min). Scale bars, 
50 um. Co-localization correlation coefficient R is shown on the right; each 
point represents one cell. ****P < 0.0001, Mann-Whitney U-test. f, Left: FRC 
cell lines treated with 250 1M MBCD (6h) stained for F-actin (red) and DNA 
(blue). Scale bars, 50 jum. Right: contraction score in the indicated FRC cell lines 
treated with MBCD. Numbers along x axis indicate dose of MBCD in 
micromolar concentrations. Mean + s.d. of three experiments (>300 cells). 
****D < 0.00005, Fisher’s exact test. g, Contraction score in PDPN-CFP- 
expressing NIH/3T3 cells pre-treated with 250 1M MBCD (6h) and 
subsequently treated with 10 yg ml” ’ CLEC-2-Fc (15 min). Mean + s.d. of 
three experiments (>300 cells). ****P < 0.00005, Fisher’s exact test. 


apart after immunization (Fig. 3g). Together, these data indicate that 
FRCs expand and that the pre-existing FRC network enlarges in res- 
ponse to acute inflammation such as that following immunization. 
The profound cytoskeletal changes in FRCs after CLEC-2 binding 
in vitro suggested that inhibition of PDPN-induced contractility by 
CLEC-2* dendritic cells might aid FRC network enlargement in vivo. 
Consistent with this, lymph node influx of immigrant dendritic cells 
(CD45* CD11c* MHCII™) bearing high CLEC-2 levels* peaked at day 
2 after OVA/CFA immunization (Fig. 3c). We therefore examined 
lymph node architecture and expansion in Cd11c-cre X Clec1b™" mice 
(CD11c4“'¥©*), These mice display selective ablation of CLEC-2 in 
CD11c* cells (Extended Data Fig. 8), the majority of which within the 
T-cell zone are dendritic cells, Interestingly, in >24-week-old CD18? 
mice, steady-state lymph node size was significantly reduced compared 
with controls (Fig. 4a), which was not the case in younger mice (Fig. 4b; 
non-draining lymph node). However, after immunization, expansion 
of draining lymph nodes was attenuated and spaces within the FRC net- 
work were smaller in young CD11¢“'"©* mice compared with controls 
(Fig. 4b, c). Finally, the lymph nodes of CD11c4“'*°? mice remained 
more rigid and less deformable than controls after immunization (Fig. 4d). 
The attenuated expansion of lymph nodes in CD11c*“"“? mice could 
be due to reduced numbers of immigrating dendritic cells, decreasing 
antigen availability and limiting T-cell expansion. However, similar re- 
sults were obtained upon administration of incomplete Freund’s adju- 
vant without OVA or mycobacterial antigens (data not shown). Moreover, 
treatment by subcutaneous injection of recombinant CLEC-2-Fc pro- 
tein but not a control Fc reagent (data not shown) after CFA/OVA 
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Figure 3 | The FRC network stretches to accommodate acute increases in 
lymph node cellularity. a, Collagen matrix contraction by FRC cell lines 
treated with 10g ml’ CLEC-2-Fc, anti-PDPN antibody (Ab) or stably 
depleted of PDPN (PDPN KD1 and 2). Representative image is shown at the 
top. Mean = s.d. of eight replicates from two experiments. ***P < 0.0001, 
* D < 0.00001, one-way analysis of variance (ANOVA), Dunnett’s multiple 
comparisons. b, Maximum three-dimensional length from cells as in 

a calculated from 100 um confocal z-stacks. Graph indicates median, 25th and 
75th percentiles (range 10th—90th percentile). Data are from two independent 
experiments (>80 cells). *P < 0.05, ****P < 0.00001, one-way ANOVA, 
Dunnett’s multiple comparisons. c, Lymph node (LN) mass, number of 
migratory dendritic cells (DCs; CD45* CD11¢* MHCII™) and number of 
FRCs (CD45~ PDPN* CD31_) in lymph nodes draining the site of OVA/CFA 
immunization. Each point represents one lymph node. Data show mean + s.d. 
of two experiments with 8-12 mice per time point. Day 0 represents non- 
immunized mice. *P < 0.05, **P > 0.001, ***P > 0.0001, ****P < 0.00001, 
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one-way ANOVA, Tukey’s multiple comparisons. NS, not significant. 

d, Forward scatter (FSC) of FRCs after OVA/CFA immunization (day 6). 
**P < 0.001, Mann-Whitney U-test. e, ER-TR7 monoclonal antibody staining 
(green) of the T-cell zone of draining or non-draining lymph nodes after 
OVA/CFA immunization (day 6). Scale bars, 30 jum. f, Left: PDPN staining 
(white) of lymph node sections as in e converted to binary images for gap 
analysis. Scale bars, 100 jum. Right: gaps (coloured circles) within the FRC 
network. White box indicates area at higher magnification. Quantification is 
shown on the far right. ****P = 4.194 X 10 ° (proportion of radii >15 um), 
Fisher’s exact test. g, Morphometric analysis of GFP” nuclei (FRCs) in draining 
or non-draining lymph node sections from PDGFRaKI-H2B-GFP mice after 
CFA/OVA immunization (day 6) (left). Magnification, x 10. Anti-GFP 
(brown), haematoxylin (blue). Quantification of average area occupied by 
individual FRCs is shown on the right. ****P < 0.00001, Mann-Whitney 
U-test. Data in f and g are from multiple sections from eight mice in two 
independent experiments. 


immunization markedly augmented lymph node expansion (Fig. 4e) and, 


CD11c*°'¥©* mice. Together, these data indicate that CLEC-2 deliv- 
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ery by dendritic cells is required for maintaining FRC network archi- 
tecture in vivo and is permissive for acute increases in lymph node size 
driven by cell influx provoked by local inflammation. 

Lymph nodes are dynamic structures that must rapidly expand to 
accommodate leukocyte recruitment and proliferation. Given that lymph 


Figure 4 | CLEC-2* dendritic cells are required for lymph node swelling 
during adaptive immune responses. a, Mass of skin draining lymph nodes 
from CD11c4“'®©* mice and Cre"® littermate controls >24 weeks old. Data 
are normalized to average lymph node (LN) mass of Cre”*® control. Each data 
point represents one lymph node. ****P < 0.00001, Mann-Whitney U-test. 
b, Lymph node mass (mg) of draining and non-draining lymph nodes from 
8-12-week-old CD11c4“ "©? mice and Cre” littermates (control) 
immunized with OVA/CFA (day 7). Each point represents a lymph node. 
Data are from three experiments. **P < 0.001, two-way ANOVA, Tukey’s 
multiple comparisons. c, Gap analysis of draining lymph nodes from 
immunized control (top) and CD1 1cACEC? mice (bottom), quantified on the 
bottom left. **P = 0.0001759 (radii >8 um), Fisher’s exact test. Scale bar, 
100 pum. d, Lymph node deformation after compression of whole lymph nodes 
with 1.4N force. Each point represents one lymph node. **P < 0.05, one-way 
ANOVA. e, Lymph node mass of CD1 1cAT#C? mice and Cre" littermates 
(control) 7 days after CFA/OVA immunization and treatment with 10 pg 
CLEC-2-Fc or PBS (days 1 and 3). Each point represents one lymph node; data 
are from three experiments. **P < 0.001, ***P < 0.0001, two-way ANOVA, 
Tukey’s multiple comparisons test. NS, not significant. 
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nodes can expand tenfold during adaptive immune responses while main- 
taining integrity, stromal components must by necessity proliferate”. 
However, in the early phases of adaptive immune responses or in response 
to acute inflammation, FRC proliferation is insufficient to account for 
lymph node expansion” (Fig. 3b). We show that early lymph node expan- 
sion is permitted by FRC network relaxation, induced by increased avail- 
ability of CLEC-2"™ dendritic cells. We previously reported that CLEC-2 
engagement by PDPN promotes dendritic cell migration along the FRC 
network®. Our data now show that PDPN is not simply a ligand for 
CLEC-2 but that it also reverse signals into FRCs to control actomyosin 
contractility. Our results suggest that a function of endogenous PDPN 
on FRCs is to cause stromal network contraction and create physical 
tension in lymph nodes. This is offset by constant contact between FRCs 
and CLEC2" resident dendritic cells, balancing contractility and con- 
trolling lymph node cellularity. The influx of additional CLEC-2™ den- 
dritic cells, combined with the upregulation of CLEC-2 on resident 
dendritic cells during acute inflammation*”®, increases inhibition of PDPN, 
allowing the FRC network to stretch. We predict that this same mech- 
anism can promote lymph node reduction as the short wave of increased 
CLEC-2 availability ends and PDPN activity returns. Whether the un- 
derlying collagen network is similarly elastic and to what extent it acts 
to limit or promote lymph node expansion is an as yet unaddressed ques- 
tion. Interestingly, CLEC-2 may not be sufficient for sustained lymph 
node expansion, as administration of CLEC-2-Fc in the absence of in- 
flammation did not affect lymph node size. Rather, CLEC-2 inhibition 
of PDPN is permissive for stretching and the influx of leukocytes via 
the high endothelial venules and afferent lymph probably provides the 
force for expansion. It is interesting to speculate that the stretching mech- 
anism described in this study may also help initiate subsequent FRC 
proliferation, given the emerging connections between tension and cell 
cycle regulation”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Mice. Experiments were performed in accordance with national and institutional 
guidelines for animal care and approved by the Institutional Animal Ethics Com- 
mittee Review Board, Cancer Research UK and the UK Home Office. Wild-type 
C57BL/6J mice were purchased from Charles River Laboratories. PDGFRaKI-H2B- 
GFP mice (B6.129S4-Pdgfratm11(EGFP)Sor/]) were purchased from Jackson Labo- 
ratories. To generate CLEC-2 floxed mice on a C57BL/6 background, C57BL/6 mouse 
Clec1b genomic regions were cloned into a pFloxRI+TK targeting vector from the 
bacterial artificial chromosome (BAC) clone R248K14 using the Red/Et recom- 
bination of Quick and Easy BAC modification kit (Gene Bridges). Primers used 
were: Cleclb BAC Fwd, TATTACCTGATGCTGTTACATCTCAGCTCTGCAG 
TATTTAGCCACCTTAGAGTTCCTAGCTGCTGACTCT gggtaccgagctcgaattct 
accg; Cleclb BAC Rev, CTGGGTTCTTTCCAGCTTCTGGCTATTATAAATAA 
GGCTGTTATGAACATAGTGGAGCATGTGTCCTTCT Tgcggccgccaccecesteg 
agctcca. Uppercase letters indicate regions homologous to Clec1b regions and low- 
ercase letters indicate regions homologous to pFloxRI+ TK vector. PCR was performed 
using Pwo DNA polymerase (Roche) following the manufacturer’s instructions. 

The 5’ loxP site was introduced in the intronic region between exon 1 and exon 2 
by insertion of the loxP-pgk-gb2-NEO-loxP cassette (Gene Bridges) using the fol- 
lowing primers: 1st loxP Fwd, AAAACCCAAAACCAAAAAACCAAAACCAAC 
AACAAAACAAAAAAACAGATaattaaccctcactaaagggcg; Ist loxP Rev, ACTTAT 
TCTCTGTCCATTCTAACATATAACTGGCTACCAAGGCCACGTGTrtaatacg 
actcactatagggctc. Uppercase letters indicate regions homologous to Clec1b regions 
and lowercase letters indicate regions homologous to loxP-pgk-gb2-NEO-loxP cas- 
sette. PCR was performed using Pwo DNA polymerase (Roche) following the man- 
ufacturer’s instructions. 

The vector was then transformed into Cre-expressing Escherichia coli EL350 (gift 
from A. Behrens), leading to recombination of the cassette and leaving a single loxP site. 

Next, the FRT-pgk-gb2-NEO-FRT-loxP cassette was introduced into the intronic 
region between exon 4 and exon 5 using the following primers: Rb2 Fwd, tcccatgtc 
aagcattttggaatgctgageeeaaacattgaaatgctgttaattaaccctcactaaaggsc; Rb2 Rev, tctcagag 
gagcacacagtgcaaaccattaagaaacacatgaaaaggaaataatacgactcactatagegctcg. Uppercase 
letters indicate regions homologous to Clec1b regions and lowercase letters represent 
regions homologous to FRT-pgk-gb2-NEO-FRT-loxP cassette. PCR was performed 
using Pwo DNA polymerase (Roche) following the manufacturer’s instructions. 

The targeting vector was linearized by Sfil digestion (New England Biolabs), 
precipitated by phenol/chloroform and electroporated into PRX-B6N C57BL/6N 
embryonic stem (ES) cells. 

Screening for homologous recombination in ES cells and mice was done by PCR 
using primers inside the FRT-pgk-gb2-NEO-FRT-loxP and outside the homology 
arm: Clec1b Gen Rev, agaccctgagaaggctgga; NEO 3’ sense, gctcccgattcgcagcgcate. 

Deletion of the NEO cassette was performed by crossing Clec1b-targeted mice 
with B-actin-Flp (B6.Cg-Tg(ACTFLPe)9205Dym), and thereafter generating Clec1b™ 
by interbreeding and screening for deletion of the NEO cassette using the following 
primers: Seq FRT Fwd, cctggtaaggaggetcccat; Seq FRT Rev, atgagtctgctagggatgct. 

Generation of CD11c4“*“* was achieved by crossing Clec1b" with CD11¢-Cre 
mice (B6.Cg-Tg(Itgax-cre)1.1Reiz; gift from B. Reizis). Both males and females were 
used for in vivo experiments and were aged 8-12 weeks unless otherwise stated in 
figures. Cre-negative littermates were used as controls in all experiments. 
qPCR analysis of Clec1b messenger RNA. CD11c* cells from day 9 cultures of 
bone marrow in granulocyte-macrophage colony stimulating factor (GM-CSF) 
(bone marrow dendritic cells and treated with lipopolysaccharide (LPS) for 6h or 
from spleens were enriched by MACS using CD11c beads (Miltenyi). RNA was 
extracted using an RNeasy mini kit (Qiagen) and cDNA generated using Super- 
script II reverse transcriptase (Invitrogen). qPCR was carried out using Sybr Green 
comparing Clec1b expression to GAPDH in each sample. Primers for Clec1b: Fwd, 
TTTGAGCACAAGTGCAGCCCC; Rev, AAGCAGTTGGTCCACTCTTG. 
Constructs. PDPN-CFP was as previously described*. RhoA and Racl FRET bio- 
sensors were provided by M. Matsuda”*. GEP-MLC was as previously described’’. 
PDPN wild type and $167A and S167E mutants were cloned into pcDNA3.1-V5- 
His by PCR using EcoR1 and Not! restriction sites. 

Cell lines. NIH/3T3 fibroblasts were cultured in DMEM plus glucose (Life Tech- 
nologies, Invitrogen) with 10% fetal bovine serum (FBS) and penicillin and strep- 
tomycin (PS). Cells were incubated and maintained at 37 °C in 5% CO2. Mouse 
lymph node FRC lines were generated by first digesting skin-draining lymph nodes 
from C57BL/6 mice and then culturing adherent stromal cells as previously reported’. 
On day 4, stromal cells were immortalized by infection with HPV-E6-encoding 
retrovirus and selected with 2.5 1.M puromycin as for the generation of carcinoma- 
associated fibroblast cell lines**. Immortalized FRCs were isolated by sequential 
MACS depletion of CD45* and CD31” cells using biotin-conjugated antibodies 
and anti-biotin beads (Mitenyi). FRC cell lines and maintained in DMEM plus 
glucose (Life Technologies, Invitrogen) with 10% FBS, PS and 1% Insulin-Transferrin- 
Selenium (Life Technologies, Invitrogen) at 37 °C in 5% COs, and split using cell 
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dissociation buffer (Life Technologies, Invitrogen). Stable knockdown of PDPN in 
FRCs was achieved with two different short hairpin (sh)RNA lentiviruses obtained 
from The RNAi Consortium of the Broad Institute that targeted the following 
sequences: GCTGCATCTTTCTGGATAATA (PDPN KD1) and GTTCTCCCAA 
CACATCTGAAA (PDPN KD2). FRC cell lines expressing RhoA or Rac] biosensors 
were generated by co-transfection of the biosensor plasmid with PiggyBac trans- 
ferase using lipofectamine 2000, followed by selection with 5 uM blasticidin for 
2 weeks. GFP-MLC-expressing FRC cell lines were generated by lentiviral trans- 
duction and cell sorting. All cell lines were regularly tested for absence of myco- 
plasma contamination by the Cell Services Laboratory, Cancer Research UK London 
Research institute. 

Overexpression studies. NIH/3T3 cells were plated at a density of 50,000 cells ml 
in a glass-bottomed 24-well plate (MatTek) the day before transfection with plas- 
mids encoding GFP, PDPN-CFP or PDPN-V5-His using Effectene transfection 
reagent (Qiagen) as per supplier’s instructions. After 24h, cells were treated with 
chemical inhibitors at the concentrations indicated (see Extended Data Fig. 1b) for 
6h before fixation. CLEC-2-Fc was added at 10 pg ml * for the time indicated in 
figures before fixation. In co-transfection experiments, plasmids encoding PDPN- 
CFP and ezrin-mCherry or PDPN-V5-His and CD44—GFP were added in equal 
amounts to the transfection mix. The cells were analysed by fluorescence micro- 
scope (X20, Nikon Eclipse Te2000-S) and contraction status scored manually. Cells 
were grouped into either contracted, partially contracted, spread or collapsed, depend- 
ing on their morphology. High-resolution images were taken using a confocal micro- 
scope (Zeiss 710 using a X20/0.8NA objective). 

siRNA knockdown of RhoGEFs. siRNA smartpools containing the following 
sequenences were obtained from Dharmacon. NIH/3T3 cells or FRC cell lines were 
transfected using Dharmafect reagent 2 according to the manufacturer’s instruc- 
tions. Mcf2 (Dbl), UGAUCAGUCUCCCAAAUUG; CCAUGCCUUUCAUCAA 
UUA; GGUGAUAACCGCAAAUUUG; CAAAGUGCAUAGACUCUUA,; GEF- 
H1 (AHRGEF2), CAACAUUGCUGGACAUUUC; GCACUGGGAUGCUGGA 
AGA; GUACCAAGGUCAAGCAGAA; UGGAAUCCCUUAUUGAUGA; LARG 
(ARHGEF12), GAUCAAGUCUCGCCAGAAA; GGACGGAGCUGUAAUUG 
CA; GAAAGGAGUUCCACAAUGC; GAAAGGAGUUCCACAAUGC; p115 
rhogef (ARHGEF1), GGUGUAACCUCAUCACUGA; GGAAAGACCGAGGC 
AACUA; GGCAAGAGGUCAUCAGCGA; GGGCUGAGCAGUAUCCUAG. 
Immunofluorescence staining. Cells were fixed with 4% paraformaldehyde (PFA) 
in PBS for 10 min before permeabilization in PBS containing 0.2% Triton-X for 10 min 
at room temperature. The cells were stained with 4’ ,6-diamidino-2-phenylindole 
(DAPI; Sigma, d9542) to reveal DNA in cell nuclei and tetramethylrhodamine 
(TRITC)-phalloidin (Sigma, p1951) to reveal F-actin in 3% bovine serum albumin 
with PBS plus 0.1% Tween-20. Cells were stained using anti-pMLC (S19) (Cell 
Signaling Technology #3675) or anti-pERM antibody (Cell Signaling Technology 
#3141) followed by appropriate Alexafluor-conjugated secondary antibodies (Invi- 
trogen). PDPN-V5 was stained using anti-V5 fluorescein isothiocyanate (FITC)- 
conjugated antibody (Invitrogen R963-25). ARP2/3 was stained using anti-p34-Arc/ 
ARPC2 antibody (07-227 Millipore). 

Lipid raft co-localization analysis. Lipid raft labelling was carried out on ice on 
unfixed cells according to manufacturer’s instructions (Vybrant lipid raft labelling 
kit 555, Invitrogen). Cells were then fixed in 4% paraformaldehyde before staining 
for PDPN (8.1.1 AF660-conjugated; eBioscience 50-5381-80). Confocal immuno- 
fluorescence images (63/1.4NA oil immersion objective) of FRCs plated on glass 
either untreated or treated with 10 jig ml’ CLEC-2-Fc for 30 min before staining 
were analysed using Zen image analysis software (Zeiss). The correlation coeffi- 
cient R was calculated based on pixel intensity. 

CLEC-2-Fc treatment in vitro. Generation of CLEC-2-Fc was as previously 
described*. Soluble CLEC-2 was used at 10 jig ml’ diluted in cell culture medium. 
For CLEC-2-beads, 10 jtm protein A-coated microspheres (Bangs laboratories) were 
incubated with 100 1g ml” ' CLEC-2-Fc diluted in PBS for 1 hat 4 °C then washed 
with cell culture medium. CLEC-2-Fc or CLEC-2-beads were added for the time 
indicated in the figures before fixation. 

Lymph node expansion in vivo. Mice were immunized with 100 pl of an emul- 
sion of OVA in CFA (100 1g OVA per mouse) or PBS in incomplete Freund’s 
adjuvant (IFA) subcutaneously in the right flank. Draining inguinal lymph nodes 
were taken for analysis at the indicated days after immunization; left-side non- 
draining lymph nodes were taken for comparison. Where mice were treated with 
CLEC-2-Fe, they received 10 pl (1 pg in PBS) subcutaneously adjacent to site of 
immunization on days 1 and 3. Lymph nodes were first weighed, then digested as 
previously described”’. Cells were counted and stained for flow cytometry analysis. 
Alternatively, intact lymph nodes were fixed in 10% formalin for immunohisto- 
chemistry analysis. 

Lymph node deformation assay. Draining and non-draining inguinal lymph 
nodes were taken from CD11c4“'®°? mice and littermate controls on day 5 after 
CFA immunization. Lymph nodes were placed on the contact points of a digimatic 
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thickness gauge (Mitutoyo) and subjected to 1.4 N of applied force. Deformability 
was calculated as follows: Deformability = 1 — (lymph node size under 1.4 N/lymph 
node size before force was applied). Each lymph node was measured twice and the 
results averaged. 

Flow cytometry. Cells isolated from lymph nodes, FRCs or NIH/3T3 cell lines 
were suspended in FACS buffer (PBS 2% FCS, 2 mM EDTA) and first blocked with 
anti-CD16/CD32 (eBioscience) for all staining procedures. Cells were counted on 
a FACS Calibur by reference to fluorescence beads. Stained cells were analysed on 
either a FACS Calibur or LSRII. Antibodies used for staining: CD45.1 (BD Biosciences), 
CD140a (clone APA5 eBioscience), CD31 (BD Biosciences), PDPN (clone 8.1.1 
eBioscience), CD35 (clone 8C12 BD Biosciences), VCAM-1 (clone M/K-2 Abcam), 
CD44 (clone IM7 BD Biosciences), CD3 (BD Biosciences), CD19 (clone ID3 BD Bio- 
sciences), MHCII (I-A/I-E BD Biosciences), CD11c (clone HL3 BD Biosciences). 
Immunohistochemistry. Tissue sections 4 tm thick were cut at three levels (with 
150 tm between the levels) from formalin-fixed paraffin-embedded lymph nodes. 
These sections were then stained using the Vector ABC elite detection system (PK6100 
1:250, Vector Laboratories). Slides were first incubated with a primary antibody for 
1h (anti-GFP: AB6673 1:350, Abcam; anti-PDPN: DM3501, Acris), followed by 
incubation with biotinylated-labelled secondary antibody (1:250, Vector Labora- 
tories) for 45 min. ABC complex was then applied for 30 min and staining was 
completed by 3 min incubation with DAB and chromogen (SK 4100, Vector Labo- 
ratories). Slides were counterstained with haematoxylin and mounted. ER-TR7 
staining was conducted on 10-m-thick frozen lymph node sections (Santa Cruz, 
sc-73355-AF488). 

Gap analysis. Quantification of gap sizes was carried out in MATLAB. PDPN signal 
was isolated and converted to greyscale. The images were thresholded, background 
subtracted, small objects removed, then converted to binary images using Image] 
software. A circle-fitting algorithm was applied whereby, in each step, the largest 
circle that could fit in the gaps and that did not overlap with other fitted circles was 
recorded. The distribution of radii of circles that fill the image was then weighted 
according to area such that larger circles had a proportionally greater weighting 
than smaller circles. The plot of distribution of radii was smoothed to transform the 
data from a discrete pixel size distribution to a continuous micrometre-size distri- 
bution. Raw data were analysed using Fisher’s exact test (P = 4.194 10°’) to deter- 
mine differences in circles with radius >15 pm. Each analysis was performed on 
images of the FRC network from >8 individual mice per group. No images were 
excluded from the analysis and all data were combined and are represented in the 
graphs shown (16 images per group). MATLAB script is available upon request. 
Automated morphometric analysis. The slides were digitized with a commer- 
cial image analysis system (Ariol; Leica Biosystems). The program was trained to 
recognize GFP* stained nuclei by size, shape and staining intensity. T-cell areas 
were identified by density of haematoxylin staining and traced out manually on 


each lymph node in the scan. Automated analysis then counted the total number 
of GFP* nuclei in each T-cell area (24 areas analysed per group). Area/FRC nuclei 
were compared using Mann-Whitney U-test. 

Racl1-GTP pulldown assay. FRC cell lines either treated with CLEC-2-Fc for 30 min 
or stably depleted of PDPN (shRNA) were subjected to Racl pulldown and ana- 
lysis as per manufacturer’s instructions (kit 16118 Pierce). Racl levels in pulldowns 
and whole lysate were compared by western blot. 

Three-dimensional cell culture and gel contraction assay. Control or PDPN knock- 
down FRCs were seeded at 10,000 per well in 150 pil collagen/matrigel matrix®**”*. 
Gels were set at 37 °C for 30 min then covered with cell culture medium. In some 
wells, the following were added to both the gel mix and medium: 10 ig ml” * CLEC- 
2-Fc, 10 1M ROCK inhibitor (Y27632) or 10 ug ml ~ ' anti-PDPN antibody (R&D, 
AF3244). Contraction of the gel at day 3 was quantified as the ratio of contracted 
gel/original area and plotted relative to control. Gels were stained with TRITC- 
labelled phalloidin and DAPI for maximum length analysis (Imaris). The furthest 
points of each individual cell were measured in x,y,z coordinates and vectors cal- 
culated for comparison (Extended Data Fig. 5). 

Statistics. Sample size for in vivo experiments was determined by litter size as 
littermate controls were used in all comparisons. For in vitro experiments, all cells 
within the sample were scored and none excluded and experiments were repeated 
at least once to ensure reproducibility and adequate statistical power. Category data 
from overexpression studies were analysed using Fisher’s exact test (R software). 
Changes to contraction and morphology were analysed using Mann-Whitney 
U-tests. Analysis of cell populations during a time course of immunization was con- 
ducted with one-way ANOVA and followed by Kruskall-Wallis multiple compa- 
risons test. Comparison of lymph node size and cellularity between control and 
CD11c4*#°* across different treatment groups was analysed using two-way ANOVA 
followed by multiple comparisons test between treatments and genotypes. Appro- 
priate statistical tests were chosen for the data set following advice from a mathe- 
matician (R.P.J.). Chemical screen for cell contraction inhibitors was scored by an 
independent observer (A.J.F.) who was unbiased as to the predicted results. 
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Inhibitor Target Buffer Final Conc. 

$B431542 Transforming growth factor-beta receptor DMSO 10 uM 

LY294002 Phosphoinositide 3-kinase (PI3K) ETOH 20 uM 

GM6001 Matrix metalloproteinase (MMP) H20 10 uM 

PP2 Src H,0 10 uM 

G06983 Protein kinase C (PKC) DMSO 10 uM 

SP600125 Jun N-terminal kinase DMSO 10 uM 

CRT 0101106 LIM Kinase DMSO 10 uM 

H89 Protein Kinase A (PKA)/Rho kinase (ROCK) H,0 10 uM 

Tat-C3 RhoA H,0 10 uM 

H1152 Rho Kinase (ROCK) H,O0 10 uM 

Y27632 Rho Kinase (ROCK) H,O 10 uM 

HA1077 Rho Kinase (ROCK) DMSO 10 uM 

ML-7 MLC kinase H,O0 10 uM 

Blebbistatin Myosin II HO 10 uM 
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Extended Data Figure 1 | Screen for inhibitors of PDPN-mediated cell 
contractility. a, Quantification of proportion of contracted NIH/3T3 
fibroblasts expressing enhanced (e)GFP control or PDPN-CFP and treated 
with the indicated inhibitors or vehicles. Statistically significant inhibition: 


**** PD < 0.00001 and *P = 0.01, Fisher’s exact test. Data represent mean + s.d. 


of three independent experiments. b, Chemical inhibitors used in a and their 
targets. c, Contraction score of PDPN-expressing NIH/3T3 fibroblasts 
transfected with siRNA smartpools targeting the indicated Rho GEFs 


siRNA transfection 


(MU-046870-01-0002, MU-040120-00-0002, MU-047092-01-0002, MU- 
041056-01-0002, Dharmacon, GE Healthcare). **P < 0.05, one-way ANOVA. 
d, Maximum length of FRCs in collagen gels measured in three dimensions 
from 100-,1m-deep confocal z-stacks. Each point represents one FRC. 

**P < 0.05, one-way ANOVA. e, PCR analysis of Rio GEF mRNA expression 
in FRC cell lines after siRNA knockdown in comparison to expression of 
GAPDH. 
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Extended Data Figure 2 | Generation of FRC lines. Comparison of an FRC _ Histograms of primary lymph node cultures are gated on CD45” CD140a* 
cell line generated by immortalization of primary FRCs (bottom, blue) with cells to exclude haematopoietic cells and other stromal subsets. Histograms of 


primary FRCs in lymph node (LN) cell suspensions cultured for 7 days (top, the FRC cell line are gated only on live cells. 
red). Grey histograms indicate isotype-matched control antibody staining. 
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Extended Data Figure 3 | Loss of PDPN results in FRC spreading and actin knockdown (red) cell lines. Data represent mean + s.d., each point represents 
polymerization. a, Single optical slice (1 um) showing morphology and pMLC an individual FRC. P< 0.0001, Mann-Whitney U-test. c, Quantification of 
organization of control and PDPN knockdown (KD) FRCs. pMLC (S19) tail retraction defects comparing control and PDPN knockdown FRCs. Data 
(green), F-actin (red). Scale bar, 50 um. b, Quantification of the number of are collated from >100 cells; P< 0.0001, Fisher’s exact test. “Present? means 
ARP2/3-positive protrusions per FRC comparing control (green) and PDPN _ that tail retraction defects were deemed present by the observer. 
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Extended Data Figure 4 | Loss of pMLC and F-actin filaments after stained for pMLC (S19) (green) and F-actin (red). Scale bars, 50 um. Higher 
treatment of FRCs with CLEC-2. Single optical slices (1 um) of FRC celllines magnification shown to the right. Scale bars, 5 um. 
treated for 30 min with 10 ug ml ! soluble CLEC-2-Fc protein, fixed and 
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Extended Data Figure 5 | Elongated morphology of CLEC-2-treated FRCs _ three-dimensional stack; staining F-actin (red), DNA (cell nuclei, blue). Scale 
in three-dimensional cultures. Quantification of maximum cell length for bar, 200 jum. Centre, x,y,z coordinates and length of each vector for each end 
100-j1m-deep z-stacks. FRCs cultured in three-dimensional collagen/matrigel of each cell in three dimensions as quantified using Imaris image analysis 
matrix for 3 days treated with CLEC-2-Fc, ROCK inhibitor 10 uM (Y27632), software. Right, example of cell morphology in each treatment group; staining 
or stably knocked down (KD) for PDPN expression. Left, projection of the F-actin (red), DNA (cell nuclei, blue). Scale bar, 50 um. 
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Extended Data Figure 6 | Time course of lymph node expansion after 
OVA/CFA immunization. Total cellularity, number of T cells (CD45‘, 
CD3°), B cells (CD45*, CD19") and biliary epithelial cells (BECs; 

CD45- PDPN” CD31") in draining lymph nodes (LNs) at different times after 
OVA/CFA immunization. Each point represents one lymph node and data 
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show mean + s.d. of two independent experiments scoring 8-12 mice. Day 0 
represents lymph nodes from non-immunized mice. *P < 0.05, **P > 0.001, 
***D > 0.0001, ****P < 0.00001, differences between non-immunized and 
immunized mice as calculated using one-way ANOVA test followed by Tukey’s 
multiple comparisons test. 
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Extended Data Figure 7 | FRCs are selectively labelled in PDGFRaKI- of PDGFRaKI-H2B-GFP lymph node imaged ex vivo using two-photon 
H2B-GFP mice. a, Analysis of skin draining lymph nodes from PDGFRa&KI- _ microscopy. FRC nuclei (green), second harmonic signal (collagen) (blue). 
H2B-GFP mice showing lymph node stromal cells co-expressing CD140a Wheat germ agglutinin AF647 (red) was injected subcutaneously 5 min 
(PDGFRx) and GFP. Left, gate for CD45 stroma; right, GFP and CD140a before lymph node extraction to label conduits. Scale bar, 200 um. 
expression of CD45~ gate. b, Flow cytometry analysis showing that GEP* d, Immunohistochemical staining of paraffin-embedded sections of lymph 
cells are CD140a‘. Left, gating for GFP" lymph node cells; right, CD140a nodes from PDGFR«KI-H2B-GFP mice. Staining GFP (brown) and PDPN 


expression of GFP™ gate (green) compared with CD45~ cells (grey). ¢, z-Stack (pink), counterstained with haematoxylin (blue). Scale bar, 200 1m. 
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Extended Data Figure 8 | Generation and characterization of CD1 1Gces 


mice. a, Scheme of targeting approach to allow conditional deletion of Clec1b 
exons 2, 3 and 4. loxP sites are shown in yellow. b, Cleclb mRNA in 
lipopolysaccharide (LPS)-treated bone marrow dendritic cells (BMDCs) or 
freshly isolated CD11c* splenocytes from CD11c4“'*°? mice and Cre"? 
littermates. Data are represented as relative expression compared to control and 
depict mean + s.d. from six replicates from two independent experiments. 

P values were calculated using Students’s t-test. c, Quantification of bone- 
marrow-derived dendritic cell (DC) morphology cultured in contact with 
FRCs. Data indicate score of perimeter’/(4m X area), with area and perimeter 
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calculated from immunofluorescence imaging using Image] software. Higher 
scores indicate increased elongation and/or protrusions. P = 0.0007, Mann- 
Whitney U-test. d, Representative images from c showing dendritic cells 
spreading over FRCs. F-Actin (red), cell nuclei (blue). Scale bar, 20 um. 

e, f, Total dendritic cell numbers (e) and total FRC numbers (f) in steady-state 
skin draining lymph nodes of control versus CD11c*“"#°* mice. Each data 
point represents one iymiph node. g, PDPN surface expression by FRCs from 
control and CD11c*°'"“? mice as measured by flow cytometry and 
represented relative to the control group. MFI, mean fluorescence intensity. 
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Diabetes recovery by age-dependent conversion of 
pancreatic 6-cells into insulin producers 
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Total or near-total loss of insulin-producing p-cells occurs in type 1 
diabetes’. Restoration of insulin production in type 1 diabetes is thus 
a major medical challenge. We previously observed in mice in which 
B-cells are completely ablated that the pancreas reconstitutes new 
insulin-producing cells in the absence of autoimmunity’. The process 
involves the contribution of islet non-f-cells; specifically, glucagon- 
producing a-cells begin producing insulin by a process of reprogram- 
ming (transdifferentiation) without proliferation’. Here we show the 
influence of age on B-cell reconstitution from heterologous islet cells 
after near-total B-cell loss in mice. We found that senescence does not 
alter a-cell plasticity: a-cells can reprogram to produce insulin from 
puberty through to adulthood, and also in aged individuals, even a 
long time after B-cell loss. In contrast, before puberty there is no detect- 
able a-cell conversion, although f-cell reconstitution after injury is 
more efficient, always leading to diabetes recovery. This process occurs 
through a newly discovered mechanism: the spontaneous en masse 
reprogramming of somatostatin-producing 6-cells. The juveniles 
display ‘somatostatin-to-insulin’ 5-cell conversion, involving dedif- 
ferentiation, proliferation and re-expression of islet developmental 
regulators. This juvenile adaptability relies, at least in part, upon the 
combined action of FoxO1 and downstream effectors. Restoration 
of insulin producing-cells from non-f-cell origins is thus enabled 
throughout life via 5- or a-cell spontaneous reprogramming. A land- 
scape with multiple intra-islet cell interconversion events is emerg- 
ing, offering new perspectives for therapy. 

To determine how ageing affects the mode and efficiency of B-cell 
reconstitution after B-cell loss, we administered diphtheria toxin (DT) 
to adult (2-month-old) or aged (1- and 1.5-year-old) RIP-DTR mice, whose 
B-cells bear DT receptors’, and followed them for up to 14 months. Col- 
lectively, we found that o-to-f-cell conversion is the main mechanism 
of insulin cell generation after massive [B-cell loss in adult post-pubertal 
mice, whether middle-aged or very old, and that «cells are progressively 
recruited into insulin production with time (Extended Data Fig. 1 and 
Supplementary Tables 1-5). 

We focused on regeneration potential during early postnatal life by 
inducing B-cell ablation before weaning, at 2 weeks of age (Fig. 1a). We 
found that prepubescent mice rapidly recover from diabetes after near- 
total B-cell loss: 4 months later all mice were almost normoglycaemic, 
thus displaying a faster recovery relative to adults (Fig. 1b and Extended 
Data Fig. 2a, b; see also Extended Data Fig. 1a). 

Histologically, 99% of B-cells were lost at 2 weeks after DT admin- 
istration (Fig. 1c). The B-cell number increased by 45-fold 4 months 
after ablation, representing 23% of the normal age-matched B-cell mass 
(Fig. 1c and Supplementary Table 6) and correlating with recovery of 
normoglycaemia’. 

All animals remained normoglycaemic for the rest of their life (Sup- 
plementary Table 6). Mice were neither intolerant to glucose nor insu- 
lin resistant during the period of analysis, up to 15 months after injury 
(Extended Data Fig. 2c-e). 


We investigated whether the new insulin * cells were reprogrammed 
a-cells, as in adults, using glucagon-rtTA; TetO-Cre; R26-YFP; RIP-DTR 
pups (Fig. 1d). We observed that almost no insulin” cells co-expressed 
yellow fluorescent protein (YFP) or glucagon (Supplementary Table 7), 
indicating that «-cells do not reprogram in juveniles. 

We further explored the age-dependency of rescue after near-total B-cell 
loss. To this aim, normoglycaemic 5-month-old mice, which had recov- 
ered from B-cell loss at 2 weeks of age, were re-administered DT to ablate 
the regenerated insulin“ cells. One month following the second ablation, 
30% of the insulin-containing cells also contained glucagon (Extended 
Data Fig. 2fand Supplementary Table 8), like B-cell-ablated adults (Ex- 
tended Data Fig. 1k), confirming that the pre-pubertal regeneration mech- 
anism is restricted temporally. 

We measured proliferation rates at different time-points over 2 months 
of regeneration. The proportion of Ki67-labelled insulin ™ cells was very 
low (Extended Data Fig. 2g and Supplementary Table 9), indicating 
that neither escaping [B-cells nor regenerated insulin” cells proliferate 
during this period. However, there was a transient 3.5-fold increase in 
the number of insular Ki67~ cells 2 weeks after ablation, unlike in adult 
animals (Extended Data Fig. 2h and Supplementary Table 10). Repli- 
cating cells were hormone-negative, chromogranin-A-negative, and 
were not lineage traced to either o- or escaping B-cells (Extended Data 
Fig. 2i, j). 

Coincident with the peak of islet cell proliferation, we noticed in pups 
a4.5-fold decrease in the number of somatostatin (Sst)-producing 6-cells 
(from 13 to 3 6-cells per islet section; Extended Data Fig. 3a and Sup- 
plementary Table 11) and a 76-fold decrease of Sst transcripts (Extended 
Data Fig. 3b), without any indication of increased islet cell death. We 
therefore lineage traced 5-cells and observed that regenerated insulin- 
producing cells were dedifferentiated 5-cells. At 2 months of age in 
Sst-Cre; R26-YFP; RIP-DTR mice, about 81% of 5-cells were YFP™ in 
the absence of B-cell ablation, whereas «- and B-cells were labelled at 
background levels (0.9% for B-cells and 0.2% for o-cells; Extended Data 
Fig. 3c, dand Supplementary Table 12). During B-cell reconstitution in 
pups, 2 weeks after B-cell ablation, 80% of YFP’ cells were proliferating 
(Ki67~) and Sst-negative (Fig. 2a, b and Supplementary Table 13), while 
most Ki67* cells were YFP-labelled (85%; Supplementary Table 14). 

These observations suggest that in B-cell-ablated pre-pubertal mice 
most 6-cells undergo a loss of Sst expression and enter the cell cycle. 

We further investigated the fate of proliferating dedifferentiated 5-cells. 
At 1.5 months post-ablation, most insulin* cells expressed YFP (90%), 
indicating their 5-cell origin (Fig. 2c, d and Supplementary Table 15). 
Furthermore, in contrast to non-ablated age-matched controls, where 
all YFP* cells were Sst* (>99%), about half of YFP“ cells were insulin * 
after 1.5 months of regeneration (45%; Fig. 2e and Supplementary 
Table 16). This reveals that half of the progeny of dedifferentiated 5-cells 
becomes insulin expressers. Bihormonal Sst“ /insulin* cells were rare 
(Supplementary Table 17). 
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Figure 1 | B-cell ablation before puberty and diabetes recovery. 

a, Experimental designs depicting the ages at DT administration and the 
various analyses. mpa, months post-ablation. b, Comparative evolution of 
glycaemia in B-cell-ablated juveniles (n = 5) and middle-aged adults (n = 4); 
2.5 months after B-cell ablation, insulin administration was stopped 

(P = 0.0014, Mann-Whitney test). Dotted line shows upper limit of 


Combined, these observations show that at the cell population level, 
each dedifferentiated -cell yields one insulin expresser cell and one Sst" 
cell (Extended Data Fig. 4). 

We confirmed with two other assays that regeneration and diabetes 
recovery in juvenile mice are 5-cell-dependent: by inducing B-cell destruc- 
tion with streptozotocin (STZ) instead of DT (Extended Data Fig. 5a—c), 
and by co-ablating B- and d-cells simultaneously in Sst-Cre; R26- YFP; 
R26-iDTR; RIP-DTR pups. In the absence of 6-cells there was no insu- 
lin* cell regeneration, and no recovery (Fig. 2f). 

In adults, 5-cells neither dedifferentiated nor proliferated after B-cell 
ablation (Extended Data Fig. 5d, e and Supplementary Table 20). Never- 
theless, like o-cells, a few 5-cells reprogrammed into insulin production, 
so that after 1.5 month of regeneration 17% of the rare insulin-producing 
cells were YFP", that is, 5-cell-derived (Extended Data Fig. 5f-h and 
Supplementary Tables 21, 22). 

By transplanting Sst-Cre; R26- YFP; RIP-DTR juvenile islets into adult 
wild-type mice we observed that, following B-cell ablation, the newly 
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Analysis 


normoglycaemia (10 mM). ¢, Islets from 2-week-old mice with no DT 
treatment (control), 1-month-old mice at 0.5 mpa and 4.5-month-old mice at 
4 mpa (Supplementary Table 6). Percentages refer to B-cell mass relative to 
age-matched unablated controls. DAPI, 4’ ,6-diamidino-2-phenylindole. 

d, o-Cell tracing in pups. DOX, doxycycline. Scale bars, 20 um. 


formed insulin® cells were reprogrammed 6-cells, thus showing that 
the pup-specific regeneration is intrinsic to islets (Extended Data Fig. 6). 

Contrary to B-cells in age-matched adult mice, 5-cell-derived insulin * 
cells replicated transiently (Extended Data Fig. 7a and Supplementary 
Table 23); the B-cell mass thus reached between 30% to 69% of the normal 
values, and remained stable for life (see earlier; Supplementary Table 6). 

We characterized the 5-cell-derived insulin” cells at the gene expres- 
sion level by quantitative polymerase chain reaction (qPCR). We first 
compared islets isolated 2 weeks after B-cell ablation or after recovery 
(4 months post-DT) with age-matched control islets. Expression of all 
the B-cell-specific markers tested was robustly increased in recovered 
mice (Extended Data Fig. 7b). We also compared regenerated insulin* 
cells with native B-cells using sorted mCherry “ cells obtained from either 
recovered or unablated age-matched (4.5-month-old) insulin-mCherry; 
RIP-DTR mice (Extended Data Fig. 7c). The two cell populations were 
very similar (Extended Data Fig. 7d), yet the 5-cell-derived replicating 
B-cells displayed a potent downregulation of cyclin-dependent kinase 


©2014 Macmillan Publishers Limited. All rights reserved 


a_Sst-Cre; R26-YFP; RIP-DTR b 
Merged 0.5 mpa 


© Ctrl pups 
G@ DT pups 


took 


Ki67*/YEP* (% cells) 


0.5 mpa 


© Sst-Cre; R26-YFP; RIP-DTR 


~ 


d © asst ion f @ B-Cell ablation (Sst-Cre; R26-YFP; RIP-DTR) 

on @ Insulin’/YFP* —__@ 8-Cell ablation (Sst-Cre; R26-YFP; R26-iDTR) 

7 pa @ B- and 8-cell co-ablation (Sst-Cre; R26-YFP; R26-iDTR; RIP-DTR) 
° -_ ed — 

4 2 100 = 

S 50 3 2 40 al Last insulin implant 

+ oO = 

= oO 

S S&S 50 

3 40 . 

— oO 

- = 0 8, 

= Ctrl 1.5 mpa Ctrl 15mpa O 4 5 5 

mpa 

9g Insulin 


Sst Ki67 


"\Kie7 


18 Weeks of age 
16 Weeks post- 


(0.5 pa) (1.5 = (2 me) (3 mpe) (4 mpa) ablation 
@-@: - O° O-..° O2,0 O 
a Ngn3* nsulin* Insulin* 
st Sst 9 Sst; Ngn3-—Kig7+ a Kie7* 
Ki67* Insulin* ©'= ‘) © 


Sst* 


Figure 2 | 5-cells dedifferentiate, proliferate and reprogram into insulin 
production after extreme B-cell loss in Sst-Cre; R26- YFP; RIP-DTR juvenile 
mice. a, Immunofluorescence for YFP and Ki67 at 0.5 mpa. b, Eighty per cent 
of Sst-traced YEP* cells are Ki67~ after B-cell ablation (controls: n = 6; 2,754 
YEFP* -cells scored; DT: n = 6; 3,146 YFP” -cells scored; P< 0.0001, Welch’s 
test; P = 0.0022, Mann-Whitney). Ctrl, control. c, d, At 1.5 mpa, 90% of 
insulin* cells co-express YFP (controls: n = 3; 6,480 insulin* -cells scored; DT: 
n= 7;1,592 insulin * -cells scored; P < 0.0001, Welch’s test; P = 0.0167, Mann- 
Whitney). Arrow indicates YFP */Sst* cells; arrowhead indicates YEP‘ / 
insulin™ cells. e, In controls, 99.9% of the YFP cells are Sst’ (n = 3; 1,673 
YFP * -cells scored). In contrast, at 1.5 mpa only 55% of the YFP* cells are Sst, 
while 45% of the YFP* cells are insulin * (n = 5; 2,295 YFP * -cells scored; 
P<0.0001, Welch’s test; P = 0.0357, Mann-Whitney). f, Comparative 
evolution of glycaemia after B-cell (n = 5), d-cell (n = 4) and B- and 6-cell 
co-ablation (n = 5) in juveniles. g, 5-cell conversion sequence. Scale bars, 

20 um. Error bars show standard deviation (s.d.). ****P < 0.0001. 


inhibitors and regulators (Extended Data Fig. 7e, f). This suggests that 
reconstituted insulin” cells are like B-cells with transient proliferation 
capacity. Future studies will establish whether reconstituted (5)B-like 
cells are true equivalents to native B-cells. 

qPCR and lineage-tracing analyses on islets isolated from pups at 
different regeneration time-points, together with Ngn3 (also known as 
Neurog3) knockout induction after B-cell ablation, revealed that Ngn3 
transcription is required for the 5-to-insulin™ cell conversion to occur 
(Extended Data Fig. 8a-k and Supplementary Tables 24-29). Of note, the 
brief expression of Ngn3 is a feature of islet precursor cells in the embry- 
onic pancreas“. Together, these observations are compatible with a model 
in which B-cell reconstitution after ablation in juveniles occurs follow- 
ing a defined sequence of events: 5-cells dedifferentiate, replicate once, 
and then half of the progeny activates Ngn3 expression before insulin 
production (Fig. 2g). This was tested in a combined double lineage-tracing 
experiment using Sst-Cre; R26-Tomato; Ngn3-YFP; RIP-DTR mice. Six 
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weeks after B-cell ablation, insulin” cells in juveniles were Tomato */ 
YFP* (Extended Data Fig. 8k). 

One key reprogramming and cell cycle entry player is FoxO1, a tran- 
scription factor whose downregulation triggers Ngn3 expression in human 
fetal pancreatic explants° and favours insulin production in Ngn3* entero- 
endocrine progenitors®. FoxO1, usually in cooperation with TGF-B/ 
SMAD signalling”*, inhibits cell proliferation through the transcrip- 
tional regulation of cell cycle inhibitors and activators’, and is involved 
in cellular senescence’ (Extended Data Fig. 9a). We next explored the 
FoxO1 molecular network in purified adult or juvenile 5-cells before and 
after (1 week) B-cell ablation, using Sst-Cre; R26- YFP; RIP-DTR mice. 

5-cells displayed divergent regulation of Foxo1 in injured juvenile and 
adult mice. Consistent with Foxo1 downregulation in juvenile 5-cells, 
Pdk1 and Akt (also known as Akt2) levels were increased, Cdkn 1a (also 
known as p21) and Cdkn2b (also known as p15Ink4b) were downreg- 
ulated, and Cks1b, Cdk2 and Skp were upregulated (Fig. 3a), which is 
compatible with the proliferative capacity of juvenile 6-cells after B-cell 
ablation. The opposite was found in the 6-cells of ablated adults (Fig. 3a 
and Extended Data Fig. 9b). 

Moreover, in 6-cells of juveniles, but not in adults, there was a robust 
upregulation of BMP 1/4 downstream effectors (Fig. 3b)'®"’. Inversely, 
TGE-B pathway genes were upregulated in 6-cells of regenerating adults 
(Fig. 3b), which is compatible with the senescence scenario’ involving 
PI3K/FoxO1 and TGF-B/SMAD cooperation to maintain differentia- 
tion and cycle arrest (Extended Data Fig. 9a, b). 

In summary, PI3K/AKT and SKP2/SCF pathways potentially coop- 
erate to downregulate Foxo1 in d-cells of regenerating juveniles. Also, 
upregulation of BMP effectors (Id1 and Id2) could contribute to 6-cell 
dedifferentiation and proliferation, as observed in other systems'®"' 
(Fig. 3c). Conversely, the PI3K/AKT pathway remained downregulated 
in 6-cells of ablated adults, which would allow FoxO1 to impede prolif- 
eration and dedifferentiation, probably through partnership with previ- 
ously described SMADs” (Extended Data Fig. 9b). 

We next checked whether a transient FoxO1 inhibition in adult mice 
would lead to a juvenile-like 5-to--cell conversion. Indeed, inactivation 
of FoxO1 in B-cells causes their dedifferentiation’’. Here, Sst-Cre; R26- 
YFP; RIP-DTR B-cell-ablated adult mice were given a FoxO1 inhibitor 
(AS1842856) for 1 week, either immediately following ablation (Fig. 3d) or 
1 month later (Extended Data Fig. 10fand Supplementary Tables 37-39)'*"». 
While FoxO1 inhibition in non-ablated controls had a minimal effect 
on insulin expression (Extended Data Fig. 10a—d and Supplementa 
Tables 30-32), regeneration in diabetic mice was improved: insulin 
cells were more abundant (11-fold; Fig. 3e, fand Supplementary Table 33), 
and were reprogrammed 6-cells (93% were YFP”, Fig. 3g and Sup- 
plementary Table 34). One-fourth of the YFP* cells expressed insulin 
only (Fig. 3h, Extended Data Fig. 10e and Supplementary Tables 35, 36), 
revealing that, like in juveniles, an important fraction of 6-cells had con- 
verted to insulin production. 

These results support the involvement of a regenerative FoxO1 net- 
work and confirm that d-cell conversion can be pharmacologically 
induced in diabetic adults. FoxO1 blockade has a pleiotropic effect: inhi- 
bition of hepatic gluconeogenesis'*!° and, as we have shown, promotion 
of 5-cell reprogramming. 

A century ago Morgan coined the terms ‘epimorphosis’ and “mor- 
phallaxis’ to designate, respectively, regeneration involving either cell 
dedifferentiation and proliferation or direct conversion from one cell 
type into another without proliferation’®. Here we report in mammals 
an age-dependent switch (‘adult transition’) between epimorphic regen- 
eration during youth, and a less efficient yet persistent throughout life 
proliferation-independent morphallactic mechanism. 

Our findings uncover a novel role for 5-cells; perhaps Sst” cells in 
the stomach, intestine or hypothalamus share the same capabilities. Intra- 
islet cell plasticity triggered by the disappearance of B-cells is influenced 
by age: the proliferation decline in ageing cells’” would explain the need 
for an adult transition. Although less efficient, «-cell plasticity remains 
long-time after B-cell loss since it is proliferation-independent. 
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Figure 3 | Age-dependent effect of B-cell loss on 6-cells. a, b, Transcriptional 
variation of cell cycle regulators, PISK/AKT/FoxO1 network genes (a), and 
TGF-B and BMP components and effectors (b) in juvenile and adult 6-cells 

1 week after ablation, as compared with age-matched controls. wpa, weeks 
post-ablation. c, B-cell loss before puberty triggers FoxO1 downregulation in 
5-cells, while the opposite occurs in adults (see Extended Data Fig. 9b). 

d, Experimental design to transiently inhibit FoxO1 in B-cell-ablated adult 
mice. e, Induction of 5-to-insulin cell conversion in diabetic adult mice. Dashed 
box indicates the area that is magnified in the right-most panel. Arrowheads 
indicate a converted 6-cell, which has lost Sst expression (insulin* and YEP* 
cell). Arrows point to an unaffected 6-cell, which is Sst and YEP, and does 


These phenomena might be translatable to humans, as there is effi- 
cient B-cell regeneration in children with type 1 diabetes or after pan- 
createctomy”"*”’, and glucagon/insulin bihormonal human cells have 
been described upon epigenetic manipulation ex vivo®, and in diabetic 
patients”’**. Knowing also that only a small fraction of the o-cell pop- 
ulation is sufficient to maintain glucagon signalling”’, understanding the 
nature of the diverse forms of intra-islet cell conversion might provide 
new opportunities for fostering the formation of («)B-like and (6)B- 
like cells. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Maintenance of a-cell plasticity in diabetic aged 
mice. a, Evolution of glycaemia in B-cell-ablated adults (middle-aged) and 
aged mice. The area under the curve (AuC) in middle-aged (2-month-old, 

= 4) and aged (1- and 1.5-year-old, n = 5 and n = 3) mice before and after 
stopping insulin administration revealed no statistical difference between 
groups (Welch’s test, Po_45mpa= 0.1029, 0.3321; Pa.s-7mpa = 0.1748, 0.50075 
one-way analysis of variance (ANOVA), P = 0.1161, P = 0.2681; and Mann- 
Whitney, P = 0.1640, 0.4519). b, Evolution of glycaemia in 14 aged mice over a 
period of 14 months post-ablation (mpa). Mice were treated with insulin for 4.5 
months; most of them (5/7 in each group) subsequently recovered from 
diabetes. c—e, Pancreatic islets before (c) and after (d, e) B-cell ablation in 
1.5-year-old mice; }}-cell mass increases 3.5-fold between 0.5 and 1 mpa, 12-fold 
at 7 mpa and 32-fold at 14 mpa, in all age groups. Percentages (0.3% and 4.4%) 
indicate B-cell mass relative to unablated controls (Supplementary Table 1). 
Two-month-old: 119.5 mpa = 43 11 mpa = 4 17 mpa = 43 1-year-old: 19.5 mpa = 5; 
= 5, 17 mpa = 5, M14mpa = 85 1.5-year-old: 19.5 mpa = 35 ™ mpa = 3; 
117 mpa = 3, 114 mpa = 8. f, B-Cell proliferation i is very low in aged mice, whether 
control (. 5%; n = 8; 39,790 insulin * -cells scored) or ablated (0.2%; n = 6; 938 
insulin* -cells scored) (Supplementary Table 2). g, Proportion of insulin* 
cells also containing glucagon after DT is not different between groups 
(Supplementary Table 3). Control: 12-month-old = 33 M1-year-old = 35 
1.5-year-old — 3; 0.5 mpa: 12-month-old = 5 11-year-old = 35 11.5-year-old — 6; 1 mpa: 
12-month-old = 4; 11-year-old = 6; 1) 5-year-old — 457 Mpa: 12-month-old 55 
6. One-way ANOVA (P = 0.6796, 0.4297, 0.9266, 


1 mpa 


11-year-old = 55 1\.5-year-old — 


0.2411); note that 40% of the cells containing insulin at 1 mpa also contained 
glucagon. The proportion of glucagon” /insulin* cells remains constant 
between 0.5 and 7 mpa, while the number of insulin” cells increases with time 
(e; Supplementary Table 1), suggesting that there is a cumulative recruitment of 
c-cells into insulin production. h, Islet with YFP*/glucagon” /insulin™ cells 
in 1-year-old glucagon-rtTA; TetO-Cre; R26- YFP; RIP-DTR mice, 7 mpa; rtTA 
expression allows the selective irreversible YFP labelling of adult o-cells 

upon administration of doxycycline (DOX) before -cell ablation. i, Proportion 
of YFP-labelled insulin-expressing cells in DOX-treated mice. Eighty per 

cent of insulin* cells are YFP™ after 7 mpa, in all age groups (Supplementary 
Table 4). Control: 12-month-old = 35 M1-year-old = 35 11.5-year-old = 33 1 mpa: 
N2-month-old = 93 11-year-old = 35 111.5-year-old = 33 7 Mpa: N2-month-old = 55 

My year-old = 53 1.5-year-oid = 5. One-way ANOVA (P = 0.9417, 0.8910, 0.9641). 
j,k, YFP“ /glucagon’ /insulin* cells at 7 mpa, following DOX pulse-labelling at 
5.5 months after B-cell loss (Supplementary Table 5). Control: 1 year-old = 55 
15-year-old = 53 7 Mpa: My-year-old = 53 11.5-year-old = 5; Welch’s correction 

(P = 0.8272, 0.8926), Mann-Whitney (P = 0.9444). On average, 15% of the 
insulin* cells found were YEP labelled, some of which no longer contained 
glucagon as in j, bottom row. Note the decreased proportion of YFP-labelled 
insulin® cells when o-cells are tagged late after ablation (from 80% to 15%; 
compare i andk), and the presence of YFP-labelled insulin */glucagon-negative 
cells in the latter situation (j), suggesting that bihormonal «-cells slowly but 
gradually lose glucagon gene activity. Scale bars, 20 um. Error bars show s.d. 
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Extended Data Figure 2 | Diabetes recovery in pre-pubertal mice. 

a, Evolution of glycemia (AuC) between 2.5 and 4 mpa, in pups and adults (see 
Fig. 1b) (Welch’s test, P = 0.0188). b, qPCR of insulin 2 messenger RNA after 
B-cell ablation; insulin 2 transcripts are 25-fold more abundant in pups than 
in adults at 2 mpa (n = 3 mice per group, each individual sample was run in 
triplicate in each reaction for a total of three independent reactions). Built-in 
Welch’s test (P = 0.0134, 0.0049). Error bars show s.d. c, Glucose tolerance tests 
(IPGTT) for DT-treated (4.5 mpa, m = 4) and age-matched controls (n = 4); 
note the fold increase between glucose injection and the glycaemic peak during 
IPGTT for each animal, and fold decrease between glycaemic peak and T120 
(two-tailed unpaired t-test, Py = 0.5836, Pj; = 0.4937). d, Plasma insulin at 
time point (in min) TO, T15 and T30 during the IPGTT. Control: n = 4; 

DT: n = 4; two-tailed paired t-test (P = 0.0008). e, Insulin tolerance tests (ITT) 
performed 1.5 years after B-cell ablation at 2 weeks of age. Controls: n = 7; 
DT: n = 10. f, 4.5 months after B-cell ablation (at 2 weeks), three mice became 
normoglycaemic and received a second treatment with DT. Ablation of 
regenerated insulin™ cells in recovered mice leads to the appearance of 
glucagon" /insulin® cells, corresponding to the type of ‘o.-cell-dependent’ 
regeneration observed in adults (31% of insulin” cells also contained glucagon; 
Supplementary Table 8). Arrow indicates glucagon” /insulin* bihormonal cell. 
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cells scored; DT: 119.5 mpa = 5, 412 insulin * -cells scored; 1). 5mpa = 3, 675 
insulin * -cells scored; Welch’s test (P = 0.1197, P = 0.0688). Error bars 

show s.e.m. h, Islet cell proliferation is increased (3.5-fold; Ki67* cells) in islets 
of DT-treated pups at 0.5 mpa. Control: 11 :nonth-old = 3, 95 islets scored; 
11.5-month-old = 3> 94 islets scored; Nz-month-old = 3, 90 islets scored; 
N25-month-old = 3, 89 islets scored; 13-month-old ctrl = 3, 91 islets scored; 
N3.5-month-old = 3 93 islets scored; 118 5-month-old = 3> 83 islets scored; 
119-month-old ctrl = 3, 83 islets scored; 119.5-month-old = 3, 88 islets scored; 

DT (2-week-old): 19.5 mpa = 6, 333 islets scored; 1; mpa = 3, 91 islets scored; 
111.5 mpa — 3, 90 islets scored; DT (2-month-old): 19.5 mpa = 3, 76 islets scored; 
1 mpa = 3, 77 islets scored; 11.5 mpa = 3, 81 islets scored; DT (1.5-year-old): 
No.5 mpa = 3, 74 islets scored; ny mpa = 3, 81 islets scored; 11.5 mpa = 3, 77 islets 
scored. Error bars show s.d. Welch’s test, one-way ANOVA (P< 0.001), 
Mann-Whitney (P = 0.0238). i, Ki67~ cells are hormone, chromogranin-A- 
negative; lineage-traced o- and DT-spared [-cells are Ki67-negative. Scale bars, 
20 um. 
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Extended Data Figure 3 | §-cell labelling and tracing in transgenic mice. 

a, The number of Sst* cells transiently decreases by 80% during the second 
week after ablation. nontroi = 255 islets, 7 mice; 13 apa = 240 islets, 5 mice; 

Ns dpa = 228 islets, 5 mice; 17 gpa = 251 islets, 5 mice; 19.5 mpa = 267 islets, 

6 mice; 11 mpa = 266 islets, 5 mice; 11.5 mpa = 206 islets, 5 mice. Error bars show 
s.d. Welch’s test (P = 0.0008, 0.0229, 0.006, 0.035), one-way ANOVA 

(P< 0.0001), Mann-Whitney (P = 0.0043). b, Relative Sst gene expression 
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sharply decreases 2 weeks after B-cell ablation in 2-week-old mice (n = 3 mice 
per group, each individual sample of each experimental group was run in 
triplicate, in three independent reactions). Built-in Welch’s test (P = 0.0002). 
Error bars show s.d. ¢, Sst-Cre; R26-YFP mice. Cre activity efficiently and 
specifically occurs in 6-cells (box: enlarged cell). Scale bar, 20 jum. 

d, Quantitative values of reporter gene expression in islet cells (n = 4; 1,263 
YFP* -cells scored). 
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Extended Data Figure 4 | 5-cells dedifferentiate, proliferate and reprogram _ correspond (y’ test) with estimates made assuming that dedifferentiated 
into insulin production after extreme B-cell loss in juvenile mice. Observed _ proliferating 5-cells yield two types of progeny (as deduced from Fig. 2c, e). 
and expected numbers of Sst* and insulin® cells per islet section, before Dashed arrows indicate phenotypic stability; plain arrows indicate dynamic 
and after B-cell ablation. Cells scored after 6 weeks (Extended Data Fig. 3a) behaviour (dedifferentiation and replication). 
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Extended Data Figure 5 | Regeneration in streptozotocin-treated pups and 
DT-treated adults. a, Immunofluorescence showing YFP-labelled insulin* 
cells at 1.5 month following streptozotocin (STZ)-induced ablation of B-cells in 
2-week-old mice. Arrows indicate YFP™ /insulin* cells; arrowhead indicates 
YFP*/Sst* cell; asterisks indicate escaping B-cells. b, Number of remaining 
B-cells per islet section at 2 weeks after streptozotocin or DT treatment in pups, 
reflecting difference in ablation efficiency of the two methods (Supplementary 
Table 18). nsrz = 87 islets, 3 mice; np; = 361 islets, 4 mice. Welch’s test 
(inter-islet P< 0.0001; inter-individual P = 0.0109), Mann-Whitney 
(P<0.001). c, The number of YFP*/insulin® cells per islet section at 1.5 mpa 
is not significantly different between the two B-cell ablation methods 
(Supplementary Table 19). ngrz = 88 islets, 3 mice; npr = 193 islets, 7 mice. 
Welch’s test (P = 0.4786). d, 5-cell numbers per islet section in controls 


Regeneration (1.5 months) 


b Ablation efficiency a 1.5 mpa d 
0.5 mpa 
P' & 10 n.a (0.4786) a 
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(n = 3, 174 islets scored), 0.5 mpa (n = 4, 140 islets scored) and 1 mpa 

(n = 3, 86 islets scored). Unpaired t-test, two-tailed (P = 0.6386; P = 0.5406). 
e, Immunofluorescence for YFP and Ki67 2 weeks (0.5 mpa) after DT, in 
Sst-Cre; R26-YFP; RIP-DTR mice. f, Experimental design for 6-cell tracing in 
B-cell-ablated Sst-Cre; R26-YFP; RIP-DTR mice at 2 months of age, and 
immunofluorescence for Sst, YFP and insulin at 1.5 mpa. Arrow indicates 
YFP*/insulin*/Sst~ cell. g, At 1.5 mpa, 17% of insulin” cells co-express 
YFP versus almost 100% in ablated prepubescent mice. Control: n = 4; DT: 
n = 8; unpaired t-test, two-tailed (P = 0.0462). h, At 1.5 mpa, 98% of the YFP* 
cells are Sst*, and 1% are insulin“ cells (versus 44% in mice ablated before 
puberty; m = 8, unpaired t-test, two-tailed). Scale bars, 20 um. Error bars 
show s.d. 
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Extended Data Figure 6 | 5-to-B-cell conversion after B-cell ablation is b, Experimental design: after 1 week of engraftment, adult host mice were 
maintained in young islets ablated underneath the kidney capsule of adult | DT-treated and left to regenerate for 6 weeks. c, 6-to-s conversion was observed 
hosts. a, Islet transplantation design: 400-600 islets isolated from 2-week-old _ in B-cell-ablated engrafted islets, like in the pancreas of juvenile mice. Scale 
Sst-Cre; R26-YFP; RIP-DTR transgenics were transferred under the kidney bars, 20 Lm. 

capsule of 2-month-old immunodeficient (SCID) mice (n = 3). 
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Extended Data Figure 7 | Characterization of 5-cell-derived regenerated 
insulin™ cells. a, Once differentiated from 8-cells (YEP), the newly formed 
B-cells re-enter the cell cycle (Ki67~ cells). Two waves of massive replication 
occur, at 3 and 4 months after injury, respectively (Supplementary Table 23). 
b, qPCR for B-cell-specific genes using RNA extracted from islets isolated 
from control and DT-treated mice, either 2 weeks or 4 months after DT 
administration (0.5 mpa and 4 mpa). Note that after an initial extreme 
downregulation of all the B-cell-specific markers explored, their levels 
significantly recover after 4 months, which correlates with the observed robust 
regeneration and diabetes recovery. Values represent the ratio between each 
regeneration time-point and its age-matched control. c, Experimental design. 
d, qPCR comparison between regenerated mCherry” /insulin™ cells isolated 
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from mice 4 months after B-cell ablation, and mCherry~ {B-cells obtained from 
age-matched controls (4.5-month-old). All markers tested are expressed at 
identical levels in both groups; non-f-cell markers are expressed at extremely 
reduced levels (threshold cycle (CT) ranging from 28 to 31), showing the 
same degree of purity in both types of cell preparations. e, f, Interestingly, in 
contrast to bona fide B-cells isolated from 4.5-month-old controls, regenerated 
insulin* cells have lower levels of cyclin-dependent kinase inhibitors, FoxO1 
and Smad3. This correlates with their increased proliferative capacity at this 
specific time-point. Scale bars, 20 1m. qPCRs: n = 3 mice per group; each 
individual sample of each experimental group was run in triplicate, in three 
independent reactions; built-in Welch’s test. Error bars show s.d. 
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Extended Data Figure 8 | Ngn3 activation is required for insulin expression 
in dedifferentiated 5-cells. a, qPCR for Ngn3 mRNA after B-cell ablation 
reveals a transitory fivefold upregulation of Ngn3 transcripts 6 weeks after B-cell 
ablation when -cell ablation is performed before puberty, but not in adult 
mice. Controls: 11 -month-old = 33 11.5-month-old = 33 42-month-old = 65 
12.5-month-old — 3; 12.5-month-old — 3; 13-month-old — 3; 13,5-month-old — 
M14 month-old = 35 DT (2-week-old): Mo smpa = 35 ™ mpa = =6 

= 3; DT (2-month-old): 10.5 mpa = 3; 11 mpa = 35 11.5 mpa = 35 12 mpa 
Each individual sample (mouse) was run in triplicate, in each of three 
independent reactions. Built-in Welch’s test (P = 0.0112, 0.0178). b, Ngn3 
transcriptional activity can be monitored in Ngn3-YFP knock-add-on mice 
because Ngn3 promoter activity results in YFP expression. In non-ablated 
age-matched control pups, or in ablated adults, no islet YFP* cells were found 
(data not shown), yet when f-cells are ablated at 2 weeks of age, 86% of insulin* 
cells also express YFP* at 1.5 mpa. Control: n = 3, 6,358 insulin * -cells scored; 
DT: n = 3, 675 insulin * -cells scored; Welch’s test (P = 0.0010). ¢, At 1.5 mpa, 
81% of YEP" cells co-express insulin, but no glucagon, Sst or PP (data not 
shown). Two weeks later, YFP* cells are almost absent, reflecting the 
downregulation of Ngn3 expression reported in a, and suggesting that insulin 
cells originate from cells transiently activating Ngn3 expression after ablation. 
Control: ny. ‘month. old ~~ = 3; 11.5-month-old = = 3; 12-month-old = = 3; N25. month:< old = 3; 
absent YEP™ cells in all control conditions; DT: No, 5mpa = 3, 31 YEP* cells; 

1 mpa = 3, 123 YFP* cells; 11.5 mpa = 3, 729 YEP * cells; 112 mpa = 33 47 YFP* 
cells. Welch’s test and ANOVA (P < 0.0001). d, Irreversible lineage tracing of 
Ngn3-expressing cells at 1 and 1.5 mpa upon tamoxifen (TAM) administration 
in Ngn3-CreERT; R26-YFP; RIP-DTR mice; immunofluorescence analyses 
reveal that in the absence of B-cell ablation, there is no YFP induction 
(controls). In ablated mice, nearly all insulin’ cells are YEP* with time 
(arrows). At early time-points (1 mpa), YFP* /hormone-negative cells are 
found: these are probably differentiating cells before insulin expression. 
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e, f, In B-cell-ablated Ngn3-CreERT; R26-YFP; RIP-DTR pups, 91% of insulin* 
cells co-express YFP* (control: n = 3, 3,472 insulin* -cells scored, DT: n = 3, 
489 insulin” -cells scored) (e) and inversely, 93% of the YEP™ cells are insulin* 
(f) (control: n = 3; absent YEP *-cells in all control conditions; DT: n = 3, 478 
YFP* -cells scored). g, Experimental design to block Ngn3 upregulation in 
B-cell-ablated prepubescent mice by administrating DOX to mice bearing five 
mutant alleles: Ngn3-tTA‘’*; TRE-Ngn3'’* ; RIP-DTR. In these mice the Ngn3 
coding region is replaced by a DOX-sensitive transactivator gene (tTA); the 
endocrine pancreas develops normally because Ngn3 expression is allowed in 
the absence of DOX by the binding of tTA to the promoter of the TRE-Ngn3 
transgene. Pups were given DT at 2 weeks of age and then DOX 2 weeks later, to 
block Ngn3 upregulation. They were euthanized when Ngn3 peaks after 
ablation (2-month-old). h, Islets from non-ablated (no DT) and ablated (DT) 
mice, exposed (Ngn3 inhibition) or not (normal Ngn3 expression) to DOX 
treatment from 4 weeks of age. B-Cell regeneration is efficient in absence of 
DOX (as previously shown), but decreases after Ngn3 blockade, resulting in the 
appearance of glucagon/insulin bihormonal cells. i, Sharply decreased 
regeneration by blocking Ngn3 expression in DOX-treated mice reveals the 
requirement of Ngn3 for efficient B-cell regeneration in pups. DT: n = 266 
islets scored, 3 mice; DIT +DOX: n = 167, 4 mice. Welch’s test (inter-islet 
P<0.0001; inter-animal P = 0.0352), Mann-Whitney (P < 0.0001). 

j, Glucagon" /insulin* bihormonal cells appear in DOX-treated B-cell-ablated 
pups (Ngn3 inhibition), suggesting a switch to an ‘adult-like’, less efficient, 
mechanism of regeneration. Control+DOX: n = 3, 9,233 insulin * -cells scored; 
DT: n = 3, 1,385 insulin” -cells scored; DT+DOX: n = 4, 141 insulin* -cells 
scored. Welch’s test (P = 0.0081), ANOVA (P< 0.0001). k, Combined double 
lineage tracing of 5-cells (Tomato*) and Ngn3-expressing cells (YFP) shows 
by immunofluorescence that nearly all insulin* cells express both reporters, 
but no Sst (arrows). Sst* cells (arrowheads) are YFP- and insulin-negative. 
Scale bars, 20 um. Error bars show s.d. 
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Extended Data Figure 9 | FoxO1 regulatory network. a, Cartoon depicting 
the FoxO1 network involved in the regulation of cell cycle progression and 
cellular senescence: FoxO1 arrests the cell cycle by repressing activators (cyclin 
D1, cyclin D2) and inducing inhibitors (Cdknla, Cdkn1b, Cdkn2b, Cdkn1c) 
(PMID: 10102273; PMID: 17873901). Cdkn1a and Cdkn2b activation, a sign of 
cellular senescence (PMID: 17667954), is regulated by FoxO1 through direct 
interaction with Skp2 protein. In turn, Skp2 blocks FoxO1 and, together with 
CKS1b, CDK1 and CDK2, triggers the direct degradation of Cdknla and 
Cdkn1b, thus promoting proliferation (PMID: 15668399). FoxO proteins are 
inhibited mainly through PI3K/AKT-mediated phosphorylation (PMID: 
10102273; PMID: 12621150; PMID: 21708191; PMID: 10217147; PMID: 
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17604717]: PDK1, the master kinase of the pathway, stimulates cell 
proliferation and survival by directly activating AKT, which phosphorylates 
(inhibits) the FoxOs (PMID: 10698680; PMID: 19635472). The PI3K/AKT/ 
FoxO1 circuit requires active TGF-B/SMAD signalling (PMID: 24238962; 
PMID: 15084259) in order to co-regulate Cdkn1a-dependent cell senescence. 
Active TGF-B signalling downregulates the BMP pathway downstream 
effectors ID1 and ID2, known to promote dedifferentiation and proliferation 
during embryogenesis and cancer progression, probably through Cdkn2b 
regulation (PMID: 11840321; PMID: 16034366). b, B-cell ablation in adults 
triggers FoxO1 upregulation and the subsequent cell cycle arrest in 5-cells. 
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Extended Data Figure 10 | 5-cell dedifferentiation in adult mice upon YFP* -cells scored; error bar show s.e.m.) expression. e, One month following 
transient FoxO1 inhibition. a—-d, The 1 week FoxO1 inhibition with the FoxO1 transient inhibition in B-cell-ablated adults, dedifferentiated 5-cells do 
compound AS1842856 in control unablated adult mice (a) results in not express glucagon (Supplementary Table 36) (treated: n = 2,986 YFP* -cells 
dedifferentiation of one-fourth of the -cell population (b; Supplementary scored; error bars show s.e.m.). f, Transient FoxO1 inhibition a long time 
Table 30) (treated: n = 3, 1,347 YFP" -cells scored; untreated: n = 4, 1,224 (1 month) after B-cell ablation also leads to the appearance of lineage-traced 
YFP* -cells scored; error bars show s.d.), without leading to insulin dedifferentiated 5-cells that express insulin (Supplementary Tables 37-39) 
(c; Supplementary Table 31) (treated: n = 3, 3,249 insulin * -cells scored; (treated: n = 3, 71 islets scored; 300 insulin * -cells scored; 1,216 YFP‘ -cells 


untreated: n = 4, 9,562 insulin -cells scored; error bars show s.d.; Welch’s test _ scored; error bars show s.d.). Scale bars, 20 pun. 
(P = 0.1590)) or glucagon (d; Supplementary Table 32) (treated: n = 2, 728 
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High-fat-diet-mediated dysbiosis promotes 
intestinal carcinogenesis independently of obesity 


Manon D. Schulz'*, Gigdem Atay'*, Jessica Heringer'*, Franziska K. Romrig', Sarah Schwitalla', Begiim Aydin’, Paul K. Ziegler**°, 
Julia Varga**, Wolfgang Reindl®, Claudia Pommerenke’, Gabriela Salinas-Riester’, Andreas Boéck®, Carl Alpert’, Michael Blaut?, 
Sara C. Polson!®, Lydia Brandl", Thomas Kirchner**", Florian R. Greten**°, Shawn W. Polson!® & Melek C. Arkan! 


Several features common to a Western lifestyle, including obesity 
and low levels of physical activity, are known risk factors for gastro- 
intestinal cancers’. There is substantial evidence suggesting that diet 
markedly affects the composition of the intestinal microbiota’. More- 
over, there is now unequivocal evidence linking dysbiosis to cancer 
development’. However, the mechanisms by which high-fat diet (HFD)- 
mediated changes in the microbial community affect the severity of 
tumorigenesis in the gut remain to be determined. Here we demon- 
strate that an HFD promotes tumour progression in the small in- 
testine of genetically susceptible, K-ras“’?”""*, mice independently 
of obesity. HFD consumption, in conjunction with K-ras mutation, 
mediated a shift in the composition of the gut microbiota, and this 
shift was associated with a decrease in Paneth-cell-mediated antimi- 
crobial host defence that compromised dendritic cell recruitment and 
MHC class II molecule presentation in the gut-associated lymphoid 
tissues. When butyrate was administered to HFD-fed K-ras@!7?""* 
mice, dendritic cell recruitment in the gut-associated lymphoid tis- 
sues was normalized, and tumour progression was attenuated. Im- 
portantly, deficiency in MYD88, a signalling adaptor for pattern 
recognition receptors and Toll-like receptors, blocked tumour pro- 
gression. The transfer of faecal samples from HFD-fed mice with 
intestinal tumours to healthy adult K-ras@’7”"" mice was sufficient 
to transmit disease in the absence of an HFD. Furthermore, treat- 
ment with antibiotics completely blocked HFD-induced tumour pro- 
gression, suggesting that distinct shifts in the microbiota have a pivotal 
role in aggravating disease. Collectively, these data underscore the 
importance of the reciprocal interaction between host and environ- 
mental factors in selecting a microbiota that favours carcinogenesis, 
and they suggest that tumorigenesis is transmissible among gen- 
etically predisposed individuals. 

Undoubtedly, a variety of factors contribute to the aetiology of in- 
testinal cancer. There are compelling arguments to include an HFD and 
the composition of the gut microbiota as key risk factors*. Given the 
rapid increase in the incidence of diet-induced obesity worldwide’ and 
the recent evidence that the microbiota in obese individuals is more 
efficient at harvesting nutrients®’, it is crucial to understand the role of 
the gut microbiota in the pathogenesis of cancer. 

Weset out to elucidate whether alterations in the gut microbial com- 
munity are the link between an HFD and pathogenesis of intestinal 
cancer. To that end, we used a well-characterized serrated hyperplasia 
model with oncogenic K-ras expression in the intestinal epithelium (K- 
ras°!?P™" mice)® and exposed these mice to an HED regimen for 22 
weeks (Fig. 1a). Whereas 33% of age-matched K-ras°!?”"' mice on a 
normal diet developed only murine serrated hyperplasia along the 


3,4,5 


intestine, HFD consumption led to further tumour progression in 60% 
of K-ras¢??™ mice (Fig. 1b). These mice developed tumours in the 
duodenum that ranged from murine serrated adenoma, with low-grade 
dysplasia (mSA-LGD) and high-grade dysplasia (mSA-HGD), to in- 
vasive carcinoma, closely recapitulating the carcinogenic sequence in 
humans (Fig. 1c). These cells metastasized to the liver, pancreas and 
spleen when mice were maintained on an HFD for 40 weeks (Extended 
Data Fig. 1a). Serrated tumours revealed increased proliferation at the 
tips of the villi; such proliferation was otherwise restricted to the crypt 
region (Extended Data Fig. 1b). Importantly, diet-induced obesity was 
compromised in K-ras®!*”"” mice during disease progression (Extended 
Data Fig. 1c). In accordance with the increased insulin sensitivity ob- 
served in these animals, lipid accumulation in the liver was lower than 
in littermate controls that were fed an HFD for 22 weeks (Extended Data 
Fig. 1d, e). Taken together, these data suggest that diet-mediated effects 
on the host were responsible for promoting serrated tumour progres- 
sion in the small intestine. 

An HED induces a low-grade inflammatory state’, and inflammation 
isa hallmark of cancer’®. Unexpectedly, tumour-necrosis factor-a (TNF- 
a), as well as the macrophage cell surface marker F4/80, were down- 
regulated in the duodenum of K-ras°!7” mice (Extended Data Fig. 1f). 
Moreover, duodenal samples from K-ras@!??™ animals exhibited a 
significant reduction in the expression of genes involved in the recog- 
nition of, and immune response to, antigens, suggesting that an onco- 
gene-driven downmodulation of host immunity had occurred (Fig. 1d 
and Extended Data Fig. 2a). This downregulation was associated with 
decreased Paneth-cell-mediated antimicrobial defence and altered ex- 
pression of differentiation markers in the small intestine (Extended Data 
Fig. 2b, c). On exposure to bacteria or bacterial antigens, Paneth cells 
secrete defensins (cryptdins), which have antibacterial function during 
immune responses and thereby shape the composition of the microbiota". 
This activity was significantly decreased in K-ras°’7"" mice regard- 
less of the tumour incidence (Extended Data Fig. 2a, c). The intestinal 
epithelium is covered by a mucus layer, which is largely composed of 
mucins and provides a physical barrier, thereby limiting damage to the 
epithelium and enhancing gut homeostasis by delivering tolerogenic 
signals’’. Indeed, the expression of the mucin MUC2 was significantly 
decreased after HFD intake (Extended Data Fig. 2c). Inagreement with 
the duodenal gene expression data, MHC class II expression in CD11c* 
and CD11b* cell populations in the lamina propria (Fig. le), as well as 
in the Peyer’s patches (Extended Data Fig. 2d), was significantly re- 
duced in K-ras°!?? mice regardless of diet. Taken together, these data 
suggest that diminished cryptdin expression by Paneth cells, owing to 
oncogene activation in combination with altered mucin profiles caused 
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Figure 1 | An HED accelerates cancer progression. a, The experimental 
scheme shows when and how long the special diet (HFD) was administered 
before final analysis. ND, normal diet. b, Histological scores show tumour 
incidence in the small intestine with the HFD regimen. Each point represents 
one animal, and lines indicate the mean (ND, LSL-K-ras??”’* controls n = 5, 
K-ras°??"' mice, n = 6; HED, LSL-K-ras@?"’"* controls n = 13, K-ras@?7?™ 
mice m = 16). The scores were assessed using a two-sided Fisher test, and 
adjustments on pairwise t-tests were made using the single-step method. 

*, P=0.05. c, Histology of the small intestine from K-ras@17P"" mice during 
murine serrated adenoma (mSA) development and comparison with the 
human serrated route. HGD, high-grade dysplasia; LGD, low-grade dysplasia; 


by an HED, could render K-ras®'”?" mice susceptible to dampened 
immunity. 

An HED can lead to marked changes in the secretory, absorptive and 
immune function of the gut by regulating microbial communities'’. To 
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of representative top candidate genes analysed by microarray analysis in 
duodenal samples from littermate LSL-K-ras@’?”"* controls and K-ras¢/??""* 
mutants on a ND or HFD regimen (n = 3 per group). P values were obtained 
from the moderated t-statistic and corrected for multiple testing with the 
Benjamini-Hochberg method. e, Representative flow cytometric analysis 
showing MHC class II expression in CD11c* and CD11b* cell populations 
from the lamina propria (LP) in littermates (n = 2 per group). Percentages do 
not add to 100% because of rounding errors. 


define whether HFD-induced alterations in the microbiota were assoc- 
iated with the increased tumour incidence, amplicons generated from 
small intestinal and colonic faecal DNA were subjected to 16S ribosomal 
RNA gene sequencing. The HFD altered the community diversity in 


Figure 2 | Diet-induced tumour 
progression is associated with an 
altered microbial community. 
Linear discriminant analysis (LDA) 
effect size (LEfSe) results show 
bacteria that were significantly 
different in abundance between 
K-ras???" mice on a ND and an 
HED regimen, and they indicate the 
effect size of each differentially 
abundant bacterial taxon in the 
small intestine (ND, n = 3; HFD, 

n = 8) (a) and the colon (ND, 

n = 3; HED, n = 7) (b). 
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Figure 3 | Butyrate supplementation, but not prebiotics, confers protection 
against HFD-induced tumorigenesis. a, The experimental scheme indicates 
additional nutritional supplementation during the HFD and the time point 
of data analysis. Histological scores for the small intestine in K-ras°'7”"" mice 
onan HFD (n = 16), after arabinogalactan supplementation (LSL-K-ras77/* 
controls, n = 5; K-ras@!7?'" mice, n = 5), after galacto-oligosaccharide 
(GOS) supplementation (LSL-K-ras@? “+ controls, n = 10; K-ras@12?"* 
mice, n = 12) and after butyrate supplementation (LSL-K-ras@1? ’* controls, 
n= 6;K-rase?™ mice, n = 7). Each point represents one animal, and the lines 
indicate the mean. The scores were assessed using a one-way analysis of 
variance (ANOVA), and adjustments on pairwise t-tests were made using the 
single-step method. *, P= 0.05; NS, not significant. b, LEfSe results show 

the bacteria that were significantly different in abundance between K-ras@!?”""" 
mice on an HED that were treated (m = 5) or not treated (n = 8) with butyrate, 


the intestine compared with the normal diet. The abundance of Heli- 
cobacteraceae, Lactobacillaceae, Enterobacteriaceae, Clostridiaceae and 
Peptostreptococcaceae was significantly higher in the small intestine 
of K-ras@!??™ mice fed an HED than in those fed a normal diet, where- 
as the abundance of Bifidobacteriaceae, Porphyromonadaceae and Alca- 
ligenaceae was significantly lower (Fig. 2a). Moreover, although no 
tumours were detected in the colon, the abundance of Enterobacteri- 
aceae, Desulfovibrionaceae, Porphyromonadaceae, Rikenellaceae, Rumi- 
nococcaceae, Lachnospiraceae, Coriobacteriaceae and Deferribacteraceae 
was significantly higher in the colonic tissue of K-ras°!7”""' mice fed 
an HED than in those fed a normal diet, whereas the abundance of 
Bifidobacteriaceae, Peptostreptococcaceae, Roseburia and Butyricicoccus 
was significantly lower (Fig. 2b). Overall, the HFD initiated a major 
structural change in the gut microbiota in tumour-bearing K-ras’? 
mice (Fig. 2 and Extended Data Fig. 3a-c). 

Bacteria are sensed through pattern recognition receptors, Toll-like 
receptors and the downstream adaptor protein MYD88 (ref. 14). Inter- 
estingly, systemic ablation of Myd88 conferred complete protection against 
tumour progression, suggesting a causal role for intestinal microbiota 
in the tumorigenic process (Extended Data Fig. 4a). To distinguish which 
Myd88-deficient cell type conferred protection against tumorigenesis, 
K-ras®'??" mice were either crossed to Myd88"" animals or trans- 
planted with Myd88-deficient bone marrow to restrict Myd88 deletion 
to intestinal epithelial cells (IECs) or haematopoietic cells, respectively. 
K-ras°'7?"* mice continued to develop invasive cancer when Myd88 
deletion was restricted to the IECs (Extended Data Fig. 4a). Although 
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and they indicate the effect size of each differentially abundant bacterial 
taxon in the small intestine. c, The gene expression profiles of selected top 
candidate genes involved in antigen recognition, the immune response and 
immune cell recruitment in duodenal samples from LSL-K-ras@'”””* controls 
and littermate K-ras®/??" animals (n = 3 per group) that were on the ND 
or HED regimen and were treated or not treated with butyrate. P values were 
obtained from the moderated t-statistic and corrected for multiple testing 
with the Benjamini-Hochberg method. d, Flow cytometric analysis of LP cells 
from LSL-K-ras°’?”"* control and K-ras°’> littermate animals on an 
HED treated with butyrate (LSL-K-ras@?»"* controls, n = 4; K-ras@!??'" mice, 
n = 7) or not treated with butyrate (LSL-K-ras77/* controls, n = 2; 
K-ras???" mice n = 2). P values were determined by one-way ANOVA and 
adjusted for the number of comparisons by using the Bonferroni method. 
Error bars indicate s.e.m. *, P= 0.05. 


the adoptive transfer of Myd88-deficient bone marrow slightly reduced 
tumour incidence, it did not prevent tumour progression, as the num- 
ber of invasive cancers was comparable to that in mice that had received 
wild-type bone marrow cells (Extended Data Fig. 4a). 

Defective innate immunity has been suggested to shape a distinct 
intestinal microbiota in mice’*"'”. To test whether the discrepancy in 
tumour incidence between germline and tissue-specific Myd88 defi- 
ciency was due toa change in the gut microbiota, we surveyed microbial 
community diversity. Systemic deletion of Myd88 led to an increase 
in Peptostreptococcaceae, Deferribacteraceae, and butyrate-producing 
Ruminococcaceae (Extended Data Fig. 4b). Furthermore, Enterococcaceae, 
the abundance of which was consistently increased after tissue-specific 
Myd88 deletion, was not detectable when Myd88 was deleted from the 
germ line (Extended Data Fig. 5a—c). Remarkably, oncogene- and diet- 
uncoupled antigen presentation in dendritic cells of the lamina propria 
was significantly attenuated after systemic Myd88 deletion (Extended 
Data Fig. 4c). A possible explanation for the differential susceptibility 
to cancer is that in systemic deletion, the deletion of Myd88 in IECs and 
haematopoietic cells has an additive protective effect. Alternatively, sys- 
temic Myd88 deficiency might have prevented tumour progression by 
mechanisms that are not only associated with community changes or 
mucosal immunity but rather by altering IEC differentiation during 
embryogenesis. 

The ingestion of dietary fibre promotes short-chain fatty acid (SCFA) 
formation and has beneficial effects on health by selectively affecting a 
restricted number of bacterial genera (Bifidobacterium and Lactobacillus) 
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Figure 4|Di Disease-associated bacteria can be transmitted to healthy 
K-ras@???™ animals, and antibiotic treatment abolishes tumorigenesis. 

a, The experimental scheme indicates the time of stool transfer from HFD-fed 
donors that had been on the diet regimen for 24 weeks to healthy 7-week-old 
K-ras°??" recipients on a ND after 1 week of antibiotic (Abx) treatment. 
The histological score of the small intestine suggested that a disease-associated 
microbiota had been delivered, as well as increased tumour incidence in the 
recipient K-ras¢??™ mice (LSL-K-ras¢’?* controls, n = 6; K-ras@'7?"" mice, 
n= 11). Each point represents one animal, and the lines indicate means. The 
scores between K-ras@’??""" mice treated or not treated with stool samples were 
assessed using a two-sided, two-sample Welch’s t-test. *, P= 0.05. b, LEfSe 
results show bacteria that were significantly different in abundance between 
ND-fed K-ras@!7?'"' mice that underwent stool transfer (n = 6) or did not 
undergo stool transfer (n = 3). c, Flow cytometric analysis of LP cells from mice 
after stool transfer (ND, LSL-K-ras@?”* controls n = 2, K-ras@??>™ mice 
n=2;ND + stool transfer, LSL-K-rase?”’* controls n = 7, K-ras??? mice 
n= 10). P values were determined by one-way ANOVA and adjusted for the 


in the gut'®. HFD-induced tumour progression was correlated with a 
significant reduction in acetate, propionate and butyrate concentra- 
tions in faecal samples (Extended Data Fig. 6a, b). To ascertain whether 
diet-induced tumorigenesis could be halted through further changes in 
the microbiota, prebiotics were administered to mice. Interestingly, ara- 
binogalactan supplementation did not provide any protective effect 
(Fig. 3a). However, different prebiotics are available, with the key dif- 
ferentiating factor being the length of the chemical chain, which deter- 
mines where in the gastrointestinal tract the prebiotic exerts its effect’. 
Supplementation of the diet with galacto-oligosaccharide (GOS) did 
not affect tumour incidence (Fig. 3a) but slightly increased the number 
of tumours per mouse. CD11c* dendritic cell numbers and MHC class 
II presentation on CD11c* dendritic cells in the lamina propria and 
mesenteric lymph nodes (MLNs) (Extended Data Fig. 6c), as well as 
expression of the genes involved in the immune response (Extended 
Data Fig. 7a), were equally reduced in HFD-fed K-ras®!?"" mice that 
had been treated with GOS and those that had not (compare with Ex- 
tended Data Fig. 2). The lack of a protective effect could be due to the 
prebiotics having little or no effect on SCFA production throughout the 
gut (Extended Data Fig. 7b). In addition, SCFA levels were significantly 
lower in the small intestine than in colonic and faecal samples (Extended 
Data Fig. 6b). 


number of comparisons by using the Bonferroni method. Error bars indicate 
s.e.m. *, P< 0.05; **, P< 0.01; ***, P<0.001; ***, P< 0.001 compared 

with littermate controls. d, The experimental scheme indicating the application 
of the antibiotic regimen during the course of the HFD and the time point for 
data analysis. The histological scores of the small intestine samples showed 
complete protection against tumour formation in the HED-fed K-ras¢/7?""" 
mice treated with antibiotics (HFD + Abx, LSL-K-ras?”»’* controls, n = 7 
K-ras@?>" mice, n = 7). Each point represents one animal, and the lines 
indicate means. The scores were assessed using a two-sided Fisher test, and 
adjustments on pairwise t-tests were made using the single-step method. 

**, P = 0.01. e, Flow cytometric analysis of LP cells showing the recruitment of, 
and surface antigen presentation on, CD1 1c’ dendritic cells in disease-free 
K-rasC??? mice after treatment with antibiotics (HFD + Abx, LSL-K- 
ras??>’* controls n = D5 K-ras??? mice n= 7; HED, LSL-K-ras¢?)/* 
controls n = 2, K-ras@'??""' mice n = 2). P-values were determined by one-way 
ANOVA and adjusted for the number of comparisons by using the Bonferroni 
method. Error bars indicate s.e.m. The differences are not significant. 


Therefore, we next sought to determine whether diet-induced tumour 
progression could be prevented when mice were orally treated with bu- 
tyrate. SCFAs, the end products of colonic bacterial fermentation, have 
several beneficial effects on the host. They serve as an energy source, 
modulate intestinal motility, are a defence barrier and have been suggested 
to have immunoregulatory functions’. Although butyrate supplemen- 
tation led to only a minor increase in the faecal butyrate concentration 
(Extended Data Fig. 8a), it markedly reduced the tumour incidence 
(Fig. 3a). This decrease was associated with a significant increase in the 
abundance of Bifidobacteriaceae and Porphyromonadaceae anda sharp 
decrease in Helicobacteraceae in the small intestine (Fig. 3b). Interest- 
ingly, branching, serration and proliferation were partially blocked in 
K-ras©!7?™ mice on treatment with butyrate (Extended Data Fig. 8b). 
The HFD-induced decrease in the expression of Muc2 (Extended Data 
Fig. 8c) and genes involved in antigen recognition and the immune re- 
sponse (Fig. 3c) was partially restored towards the level of the normal 
diet group. Moreover, compromised dendritic cell recruitment was nearly 
normalized (Fig. 3d). Although most studies have focused on colonic 
bacteria, the small intestine has a crucial role in carbohydrate and fat 
uptake”*. However, the concentration of SCFAs in the small intestine 
remained lower than that in colonic or faecal samples (Extended Data 
Fig. 6b). Butyrate seemed to be a potent SCFA, with systemic effects on 
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metabolic parameters during diet-induced obesity (Extended Data Fig. 8d), 
although histone H3 and H4 acetylation remained unaffected (Extended 
Data Fig. 8e). These results indicate that butyrate exerts its protective 
effect on tumorigenesis at least in part through changes in bacterial com- 
position and through regulating K-Ras signalling. 

To confirm the causal relationship between diet-induced dysbiosis 
and intestinal cancer, mice on the normal diet were treated for 1 week 
with an antibiotic mixture and then colonized with fresh faecal samples 
from HFD-fed K-ras@/7? donors. Remarkably, disease was transmit- 
ted to otherwise healthy K-ras°!7?""* mice on a normal diet but not to 
LSL-K-ras°'?”"* controls (Fig. 4a and Extended Data Fig. 9a). Compar- 
ison of the microbial community composition between K-ras©'7?™ mice 
that did and did not undergo stool transfer showed a higher abundance 
of Lactobacillaceae, Helicobacteraceae and Clostridiales in stool recipients, 
reflecting the transfer of the HFD-shaped microbiota to the normal diet 
phenotype (Fig. 4b and Extended Data Fig. 9b). MUC2 expression was 
significantly lower in mice that received the stool than in those that did 
not, reminiscent of the HFD group (Extended Data Fig. 10a). The recruit- 
ment of dendritic cells and MHC class II presentation by dendritic cells 
in the lamina propria and Peyer’s patches were also compromised in mice 
that received the stool (Fig. 4c and Extended Data Fig. 9c), and glucose 
clearance during a glucose tolerance test was similar to that of the normal 
diet group (compare Extended Data Fig. 9d with Fig. 1d). These findings 
provide evidence that a diet-shaped microbiota synergizes with oncogenic 
K-ras during tumorigenesis in the intestine, independently of obesity. 

To further support the evidence that a distinct shift in bacterial com- 
munity has a causative role in tumour progression, the mice were treated 
with antibiotics. In support of the stool transfer experiments and the 
existence of a disease-associated microbiota, antibiotic supplementa- 
tion abolished tumour formation in K-ras°!??™ mice (Fig. 4d). The 
HFD-mediated downregulation of genes involved in the immune res- 
ponse, the mucin profile and the differentiation markers in the duo- 
denum were less pronounced after antibiotic treatment (Extended Data 
Fig. 7a). MHC class II presentation by dendritic cells was still compro- 
mised in the lamina propria after antibiotic treatment (Fig. 4e) but not 
in the MLNs (Extended Data Fig. 10b). Taken together, these results 
suggest a critical role for diet-shaped dysbiotic bacteria in aggravating 
oncogene-driven intestinal carcinogenesis (Extended Data Fig. 10c). 

The perturbation of immunoregulatory functions by a dysbiotic mi- 
crobiota is becoming increasingly recognized as a hallmark ofimmune- 
mediated diseases’. However, diet-associated cancer development may 
be based on marked shifts in bacterial communities rather than on the 
development of obesity and metabolic disorder. Thus, personalized die- 
tary interventions might allow an individual’s microbiota to be mo- 
dulated to promote health, especially in those who are at a high risk 
because of genetic susceptibility and a high fat intake. 


METHODS SUMMARY 


K-ras©!?>" mice on a C57BL/6J;129 background have been described previously’. 
The mice were fed either with a total pathogen-free diet (Altromin, catalogue no. 1314; 
percentage of total calories, 27% protein, 60% carbohydrate and 13% fat) (Supplemen- 
tary Table 1) ora y-irradiated HFD (Research Diets, catalogue no. D12492; percentage 
of total calories, 20% protein, 20% carbohydrate and 60% fat) (Supplementary 
Table 2). Sections from the small intestine and colon were evaluated and scored by 
pathologists in a blinded manner. For immunohistochemistry, paraffin-embedded 
sections were stained according to standard procedures using a monoclonal rat anti- 
5-bromodeoxyuridine (BrdU) antibody (1:200; AbD Serotec, catalogue no. MCA 2060) 
and counterstained with haematoxylin. BrdU (100 mg per kg body weight; Sigma, 
catalogue no. B9285) was intraperitoneally injected 2 h before the mice were killed. 


Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Mice. Littermate mice were co-housed randomly regardless of their genotype dur- 
ing the various diet and additional nutritional supplementation regimens. However, 
during stool transfer experiments, the mice were separated and co-housed accord- 
ing to their genotypes to avoid cross contamination. During treatment with anti- 
biotics, the drinking water was supplemented with a mixture of 0.2 g1”' ampicillin, 
0.1 g1~! vancomycin, 0.2 gl” ' neomycin and 0.2 g1”' metronidazole. During trans- 
fer experiments, following antibiotic treatment for 1 week, 7-week-old mice were 
colonized with fresh stool pellets (9 x 10° bacteria) from mice that had been fed 
the HFD for 24 weeks. GOS (Bi2muno) (5.5 g) was provided in 150 ml drinking 
water, which was refreshed every 2 days. Sodium butyrate (Aldrich, catalogue no. 
303410) (50 ug per g body weight in 100 il water) and arabinogalactan (Fluka, cat- 
alogue no. A-09788) (0.01 per g body weight in 100 il water) were provided orally 
three times a week. K-ras@!7>" mice were further crossed to Mydss™ "and M yd8s 
mice (Jackson Laboratory). During adoptive transfer experiments, 6-week-old K- 
ras@!7>"" mice were irradiated (9 Gy), and 2 X 10° bone marrow cells from Myd88- 
or wild-type mice were transferred by tail vein injection to recipients. The pro- 
cedures were approved by the Regierung von Oberbayern. 
Human samples. Human samples were analysed after irreversible anonymization 
of the patients’ personal data. Histological sections were microscopically evaluated 
by pathologists after haematoxylin and eosin staining. All procedures were performed 
according to the recommendation of the ethics committee of Ludwig Maximilians 
University. 
Glucose tolerance test (GTT). Following 9 h of fasting, the basal glucose level was 
detected using a glucometer (Bayer, catalogue no. 3822850). Mice were injected with 
1.5 g glucose per kg body weight (40% glucose solution, Eifelfango), and blood glu- 
cose levels were recorded for up to 2h. Plasma samples were collected simulta- 
neously, and insulin secretion was measured with the Ultra Sensitive Rat Insulin 
ELISA Kit (Crystal Chem, catalogue no. 90060) using a mouse standard (Crystal 
Chem, catalogue no. 90070). The tests were assessed in a blinded manner. 
RNA analysis. Total RNA was extracted from the small and large intestine using 
the RNeasy Mini Kit (QIAGEN). cDNA was synthesized using SuperScript II Re- 
verse Transcriptase (Invitrogen). Real-time PCR analysis using Power SYBR Green 
PCR Master Mix (Applied Biosystems) was carried out on a StepOnePlus Real- 
Time PCR System (Applied Biosystems). The primer sequences used are as follows: 
F4/80, Fw 5'- CTTTGGCTATGGGCTTCCAGTC-3’, Rev 5’-GCAAGGAGGA 
CAGAGTTTATCGTG-3’; IL-1B, Fw 5’-GTGGCTGTGGAGAAGCTGTG-3’, 
Rev 5'-GAAGGTCCACGGGAAAGACAC-3’; IL-6, Fw 5'-GTATGAACAACG 
ATGATGCACTTG-3’, Rev 5’-ATGGTACTCCAGAAGACCAGAGGA-3’; TNF- 
a, Fw 5'-ACTCCAGGCGGTGCCTATG-3’, Rev 5'-GAGCGTGGTGGCCCCT- 
3’; sucrase-isomaltase, Fw 5’-CAACCTCGGCAAAACCTTTATAGT-3’, Rev, 
5'-TGCAGCCTCTCTCTACGCAA-3’; synaptophysin, Fw 5'-TTCGTGAAGG 
TGCTGCAGTG-3’, Rev 5'-TCTCCGGTGTAGCTGCCG-3’; cryptdin, Fw 5'-C 
AGCCGGAGAAGAGGACCAG-3’, Rev, 5’-TAGCATACCAGATCTCTCAAC 
GATTC-3’; MUCI, Fw 5'-GAGCCAGGACTTCTGGTAGGCT-3’, Rev 5'-GG 
CTTCACCAGGCTTACGTAGT-3’; MUC2, Fw 5’-TCGCCCAAGTCGACACT 
CA-3’, Rev 5’-GCAAATAGCCATAGTACAGTTACACAGC-3’; MUCS, Fw 5’- 
GATCCATCCATCCCATTTCTACC-3’, Rev 5'-TTGCTTATCTGACTACCAC 
TTGTTGA-3’; REG3A, Fw 5'-GGTGAGGCTTCCTTTGTGTCC-3’, Rev 5’-CT 
CCATTGGGTTGTTGACCC-3'; MHC class II Abl, Fw 5’-GCGCATACGAT 
ATGTGACCAGAT-3’, Rev 5’-GCGGTGCTCGCCCA-3’; MHC class II DMA, 
Fw 5'-CGTTGGTCTGTTTCATCAGCA-3’, Rev 5’-ATCGACAGCTGAGATG 
GATGTG-3’; CCL20, Fw 5’-GGTGGCAAGCGTCTGCTC-3’, Rev 5’-GCCTG 
GCTGCAGAGGTGA-3’; defensin, Fw 5'-TCGTTCTGCTGGCCTTCC-3’, Rev 
5'-CCTGGCTGTTCCTCAGTTTTAGTC-3’; FcR, Fw 5'-TTGCTCCTTTTGG 
TGGAACAA-3’, Rev 5'-GGACAATACCATACAAAAACAGGACA-3'; CLECAN, 
Fw5'-CCCCCATGAACCCAATCTT-3’, Rev 5'-CAGCCCCATTTCGAAGGA- 
3'; CLEC7A, Fw 5'-GTGCAGTAAGCTTTCCTGGG-3’, Rev 5'-TCCCGCAAT 
CAGAGTGAAG-3’; and cyclophilin, Fw 5'-ATGGTCAACCCCACCGTGT-3’, 
Rev 5'-TTCTGCTGTCTTTGGAACTTTGT-3’. 
Microarray analysis using the GeneChip Mouse Gene 1.0 ST array. RNA iso- 
lation from three biological replicates of duodenal samples was carried out using 
the TRIzol (Invitrogen) method according to the manufacturer’s instructions. The 
samples were treated with DNase I (Sigma-Aldrich). RNA quality was checked by 
microfluidic electrophoresis using the 2100 Bioanalyzer (Agilent Technologies), and 
only samples with comparable RNA integrity numbers were processed for micro- 
array analysis. 

cDNA was synthesized using 0.3 wg of total RNA. The synthesis of the double- 
stranded cDNA was performed using WT Target Labeling and Control Reagents 
(Affymetrix), and the clean-up was carried out using the GeneChip Sample Clean- 
up module (Affymetrix). 

In vitro transcription was conducted using the WT Target Labeling Kit. The total 
amount of the reaction product was purified using the GeneChip Sample Cleanup 
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Module and quantified with the ND-1000 (NanoDrop, Thermo). cDNA synthesis 
(single strand, ss) was carried out with the WT Target Labeling Kit, and total 
ssDNA (5.5 Lig) was enzymatically cleaved into 35-200 base-pair fragments. The 
degree of fragmentation and the length distribution of the ssDNA were analysed 
by capillary electrophoresis using the 2100 Bioanalyzer. Following fragmentation, 
a terminal labelling reaction (with biotin) was conducted using the WT Labeling Kit. 

Biotinylated fragmented ssDNA was then hybridized onto the GeneChip Mouse 

Gene 1.0 ST Array (Affymetrix) according to the manufacturer’s instructions. The 
hybridization was carried out at 45°C in the GeneChip Hybridization Oven 640 
(Affymetrix) for 16 h. Washing and staining of the arrays on the GeneChip Fluidics 
Station 450 (Affymetrix) were performed according to the manufacturer’s recom- 
mendations. Amplification of the antibody signal and washing and staining pro- 
tocols were also according to the manufacturer’s instructions and were used to stain 
the arrays with streptavidin R-phycoerythrin (SAPE, Invitrogen). The arrays were 
incubated twice with SAPE solution, together with a biotinylated anti-streptavidin 
antibody (Vector Labs) staining step, to amplify staining. Finally, the arrays were 
scanned using the GeneChip Scanner 3000 7G (Affymetrix). 
Microarray data processing and statistical analysis. AGCC Software (version 2.0, 
Affymetrix) was used for the extraction of intensity data, which were analysed using 
the affy** and Limma™ packages of Bioconductor”. The analysis was carried out 
exactly as previously described’®. The data analysis consisted of between-array nor- 
malization, probe summary, global clustering and principal components analysis 
(PCA) analysis, fitting of the data to a linear model and detection of differential 
gene expression. To ensure that the intensities had similar distributions across arrays, 
quantile normalization was applied to the log-transformed intensity values as a 
method for between-array normalization”. Similarly to the summary of probes, a 
median polish procedure was chosen. 

Significant changes in the expression of genes between the groups were analysed 
by empirical Bayes statistics, by moderating the standard errors of the estimated 
values”®. 

P values obtained from the moderated t-statistic were corrected for multiple 
testing with the Benjamini-Hochberg method”. P-value adjustments guarantee a 
smaller number of false positive findings by controlling the false discovery rate 
(fdr). For each gene, the null hypothesis suggesting that there is no differential ex- 
pression between degradation levels was rejected when the fdr was lower than 0.05. 
Samples were assessed in a blinded manner. 

Cell isolation from lamina propria. Intestinal tissue was cut into small pieces, 
shaken vigorously in 25 ml RPMI and washed three times with PBS. Tissue pieces 
were then incubated in 50 ml PBS containing 0.015 g dithiothreitol (DTT) and 500 pl 
0.5 M EDTA with constant shaking for 20 min at 37 °C. The pieces were then col- 
lected and transferred to 10 ml RPMI containing 50 jl DNase I grade II (100 mg ml ') 
and 50 il collagenase D (100 mg ml ') and then incubated for 25 min ina rotating 
incubator at 37 °C. Cells were passed through a cell strainer and centrifuged at 
194g for 5 min, and the resultant pellet was resuspended in fluorescence-activated 
cell sorting (FACS) buffer (2% FCS in PBS). 

Cell isolation from the Peyer’s patches. The Peyer’s patches were excised from 
the intestine. They were incubated for 20 min in a mixture of 3 ml RPMI, 15 pil col- 
lagenase D (100 mg ml ') and 15 il DNase I grade II (20 mg ml *) at 37 °C. The 
solution was then passed through a cell strainer and incubated for 10 min at 37 °C. 
The reaction was stopped by the addition of 10 110.5 M EDTA, and the volume was 
adjusted to 10 ml with RPMI. After centrifugation at 249¢ for 5 min, the cells were 
resuspended in FACS buffer. 

Cell isolation from the MLNs. The MLNs were collected in 5 ml 2% FCS in PBS, 
sheared on glass slides and passed through a 100 jum filter and then a 70 um filter. 
Following centrifugation at 194g for 5 min at 4 °C, the erythrocytes were lysed with 
1 ml of Red Blood Cell Lysing Buffer (Sigma) for 5 min at 20°C. The pellets were 
then washed in 2% FCS in PBS, centrifuged at 194g for 5 min and resuspended in 
FACS buffer. 

FACS. After cell isolation from the lamina propria, Peyer’s patches and MLNs, ap- 
proximately 1 X 10° cells were stained with 0.5 pg ml” ' ethidium monoazide (EMA) 
(Sigma, catalogue no. E2028) to stain dead cells for 15 min under a light source. Cells 
were washed and incubated with mouse BD Fc Block (BD Pharmingen, clone 2.4G2, 
catalogue no. 553141) for 10 min on ice and then washed and centrifuged at 775¢ for 
5 min. Fluorescently labelled antibodies (fluorescein isothiocyanate (FITC) rat anti- 
mouse CD11b antibody, BD Pharmingen, clone M1/70, catalogue no.557396; anti- 
mouse phycoerythrin (PE)-cyanine7 CD11c, eBioscience, clone N418, catalogue 
no. 25-0114-82; and allophycocyanin (APC) anti-mouse MHC class II (I-A/I-E), 
eBioscience, clone M5/114.15.2, catalogue no. 17-5321-81) (1:200) were added 
and incubated for 20 min on ice. After washing, cells were fixed by incubating in 
fixation buffer (eBioscience) for 30 min on ice. Following a further washing step, 
cells were suspended in FACS buffer, filtered using 50 1m filcons (Giinter Keul) 
and sorted using the Gallios Flow Cytometer (Beckman Coulter). Data analyses were 
carried out using FlowJo software (version 8.8.6). 
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Sample preparation and pyrosequencing. Stool samples were freshly collected 
and immediately frozen in liquid nitrogen. Genomic DNA was isolated from 200 mg 
frozen faecal samples using a QlAamp DNA Stool Mini Kit (QIAGEN). PCR am- 
plification of the V1-V3 region of bacterial 16S ribosomal DNA was carried out 
using primers (338F, 5’-ACTCCTACGGGAGGCAGCAG-3’, 806R, 5'-GGAC 
TACCAGGGTATCTAAT-3’) incorporating FLX Titanium adaptors (Roche) and 
a sample barcode sequence. The PCR was made based on the ‘Amplicon Library 
Preparation Method’ (Roche) using a FastStart High Fidelity PCR System (Roche), 
10 mM PCR Nucleotide Mix (Roche) and 10 ng stool DNA per reaction in a 25 pl 
reaction volume. After purification of the product on a 1.2% agarose gel, equal con- 
centrations of amplicons were pooled from each sample. Emulsion PCR and GS 
FLX amplicon sequencing were carried out according to the Roche Titanium series 
chemistry. 

Pyrosequenced amplicon libraries were screened for quality characteristics, chi- 
maeric sequences, and PCR/pyrosequencing-induced duplication artefacts using 
AmpliconNoise and Perseus software (version 1.28) as previously described”. Cleaned 
data sets were processed using the QUME amplicon analysis pipeline (version 1.7.0)". 
Briefly, sequences were clustered into distance-based (97% similarity) operational 
taxonomic units (OTUs) using mothur software (version 1.30.2)”. Representatives 
of each OTU were aligned to the Greengenes core set alignment*’ using PyYNAST 
software (version 1.1)**. Taxonomic assignments for each OTU were made using 
the Ribosomal Database Project (RDP) Classifier (version 2.2)”. 

LEfSe results for microbiomes. LEfSe is an algorithm for applying 16S rRNA gene 
data sets to detect bacterial organisms and functional characteristics that are dif- 
ferentially abundant between two or more microbial environments”. It emphasizes 
statistical significance, biological consistency and effect relevance, allowing the iden- 
tification of differentially abundant features that are also consistent with biologically 
meaningful categories (subclasses). LEfSe first robustly identifies features that are 
significantly different among biological classes. It then carries out additional tests 
to assess whether these differences are consistent with respect to expected biolog- 
ical behaviour. The non-parametric factorial Kruskal-Wallis (KW) rank-sum test 
is used to detect features with significant differential abundance with respect to the 
class of interest; biological consistency is subsequently investigated using a set of 
pairwise tests among subclasses using the (unpaired) Wilcoxon rank-sum test. As 
a last step, LEfSe uses linear discriminant analysis (LDA) to estimate the effect size 
of each differentially abundant feature. Samples were assessed in a blinded manner. 
SCFA measurement. Stool samples were freshly collected and immediately frozen 
in liquid nitrogen. A 1:5 dilution of the samples in double distilled water was centri- 
fuged, and the supernatant was mixed with 12 mM isobutyric acid, 1 M NaOH and 
0.36 M HCIO,. After lyophilization for 16h, the remaining powder was diluted 
with acetone and 5 M formic acid and centrifuged, and the supernatant was used 
for the measurement with an HP 5890 Series II gas chromatograph (Hewlett Pack- 
ard). Samples were assessed in a blinded manner. 

Histone acetylation. Histones were extracted using an EpiQuik Total Histone Ex- 
traction Kit (Epigentek, catalogue no. OP-0006). H3 or H4 acetylation was determined 
using an EpiQuik Global Histone H3 (or H4) Acetylation Assay Kit (Epigentek, 
catalogue no. P-4008 and P-4009), according to the manufacturer’s instructions. 
Statistical analysis. Outcomes on sufficiently metric scales (cell percentage, mes- 
senger RNA levels, SCFA levels, percentage histone acetylation) were assessed with 
linear models (one-way ANOVA), modelling the mean value of each group, which 
were defined through mutation, diet and diet supplementation. Relevant compar- 
isons of mean values between groups were made with two-sided pairwise t-tests; 
the null hypotheses stated that there was no difference in the means. P values were 
adjusted for the number of comparisons by using the Bonferroni method. 

Outcomes on a discrete/ordinal scale (histological scores) were either assessed 
using Fisher’s exact test, Welch’s t-test or one-way ANOVA models, as above. 


Fisher’s test was chosen when one of the groups comprised only values on a single 
outcome level, because in such a case the assumption of equal variances is hardly 
violated. A two-sided Fisher’s test was applied to a two-by-two contingency table 
classifying the histological scores as either no pathology or any pathology. The 
ANOVA method was chosen when both groups had observations on at least two 
levels of the histological score and mimicked the Cochran-Mantel-Haenszel tests 
for linear trends in ordinal outcomes. Adjustments on pairwise t-tests were made 
using the single-step method*’. Both types of adjustments control the family-wise 
error rate of making one or more false discoveries. 

Outcomes measured over several time points (the GTT, insulin values and the 
weight curve) were assessed for differences between groups (LSL-K-ras@?”"* con- 
trol mice and K-ras¢7?™ mice) with linear models. The models estimated the mean 
values of each group-time combination, and t-tests were carried out for every pair 
of mean values at a specific time point. P values were adjusted for multiple testing 
within each time series: that is, for 5 to 6 tests for glucose and insulin, and for 12 
tests for the weight measurements. Adjustment was carried out with the single- 
step method as mentioned above, which is implemented in the glht (general linear 
hypothesis) function in the R package multcomp. 

The number of mice per group used in an experiment is annotated in the corre- 
sponding figure legend as n. Although no prior sample size estimation was per- 
formed, we have used as many mice per group as possible. No gender differences 
were observed. The significance of tests is reported as NS (not significant), * (P= 
0.05), ** (P = 0.01) and *** (P = 0.001). Statistical tests were carried out using Prism4 
(GraphPad Software) and R (R Core Team, 2013). 
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Extended Data Figure 1 | An HED accelerates carcinogenesis independently 
of obesity and insulin resistance. a, Primary tumours metastasized to the liver, 
pancreas and spleen in K-ras!7?" mice maintained on an HED for 43 weeks. 
b, Immunohistochemistry staining of duodenal samples using an antibody 
specific for 5-bromodeoxyuridine (BrdU) showed increased proliferation in 
K-rasC?Pi" mice. ¢, Weight curves for LSL-K-ras¢?>’* controls (n = 5) and 
K-ras°??™ (4 = 6) littermate animals showed no difference in weight gain 
under the normal diet (ND) condition, although K-ras@!?? mice (n = 7) 
remained significantly leaner when fed on an HED (LSL-K-ras“'?”* controls 
n= 4). Pvalues were determined by t-test and adjusted for multiple testing. The 
error bars indicate s.e.m. *, P= 0.05; **P =0.01. d, In accordance with the 
weight curves, the response to glucose overload and insulin secretion during 
a glucose tolerance test (GTT) remained similar between the two groups under 
the ND condition; however, K-ras!??™ mice remained insulin sensitive on an 


HED (ND, LSL-K-ras@!?”’* controls n = 5, K-ras®!??"* mice n = 3; HED, 
LSL-K-ras¢?’* controls n = 8, K-ras@??™ mice n= 5). P values were 
determined by t-test and adjusted for multiple testing. The error bars indicate 
sem. *, P= 0.05; **, P=0.01. The results are representative of two to three 
independent experiments. e, Together with resistance to diet-induced obesity, 
K-ras@1?* mice showed microvesicular steatosis, in contrast to the littermate 
controls (which had macrovesicular steatosis), suggesting decreased lipid 
accumulation in the liver of K-ras“? mice. f, Messenger RNA expression 
levels of F4/80 and TNF-o were analysed by reverse transcriptase (RT)-PCR 
(HED, LSL-K-ras@?"’* controls n = 3, K-ras¢??™ mice n = 6). Plasma 
TNF-« levels determined by enzyme-linked immunosorbent assay (ELISA) 
showed decreased levels in K-ras@!??" mice (HED, LSL-K-ras@*”* controls 
n=7, K-ras??? mice n = 11). P values were determined by t-test. 

The error bars indicate s.e.m. *, P= 0.05. 
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Extended Data Figure 2 | The host immune response is dampened in 
K-ras@7?™ mice. a, Relative mRNA expression levels of genes involved in the 
immune response were analysed by RT-PCR in duodenal samples from 
mice under the ND or the HFD regimen (ND, LSL-K-ras@'7""* controls n = 3, 
K-ras??? mice n = 3; HED, LSL-K-ras@?”* controls n = 3, K-ras¢??" 
mice n = 6). P values were determined by one-way analysis of variance 
(ANOVA) and adjusted for the number of comparisons by using the 
Bonferroni method. The error bars indicate s.e.m. **, P<0.01.*, P< 0.05 and 
*# P<0.01 compared with littermate controls. b, Azure eosin staining of 
duodenal samples showed decreased amounts of granules with antimicrobial 
peptides in the crypts of K-ras°/??"* mice. c, The expression of differentiation 
markers for Paneth cells (cryptdin), epithelial cells (sucrase-isomaltase) and 
enteroendocrine cells (synaptophysin), as well as mucins in the duodenum, was 


analysed by RT-PCR under the ND or the HFD regimen (ND, LSL-K-ras@'7"* 
controls n = 3, K-ras??? mice n = 3; HED, LSL-K-ras@?”’* controls n = 3, 
K-ras???" mice n = 4). P values were determined by one-way ANOVA 

and adjusted for the number of comparisons by using the Bonferroni method. 
The error bars indicate s.e.m. *, P< 0.05. ***, P<0.001 compared with 
littermate controls. d, Flow cytometric analysis of cells from the lamina propria 
(LP) and Peyer’s Patches (PP) indicated the presence of CD1 1c* dendritic 
cells (DCs) and the expression of MHC class II molecules in K-ras@/??"" mice 
and littermate controls on the ND or the HFD regimen (n = 2). P values 
were determined by one-way ANOVA and adjusted for the number of 
comparisons by using the Bonferroni method. The error bars indicate s.e.m. 
*, P= 0.05; **, P<0.01. *, P= 0.05 compared with littermate controls. 
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Extended Data Figure 3 | An HED leads to community changes in the gut 
microbiota. a, b, Linear discriminant analysis (LDA) effect size (LEfSe) results 
showed bacteria that were significantly different in abundance among 
K-ras???" mice and littermate controls on the HED and indicated the effect 
size of each differentially abundant bacterial taxon in the small intestine 


Proteobacteria.Gammaproteobacteria_Enterobacteriales Enterobacteriaceae. Escherichia_Shigella _ Proteobacteria.Epsilonproteobacteria.Campylobacterales.Helicobacteraceae. Helicobacter 


0.02 


0.00 


(LSL-K-ras@?’* controls, n = 3; K-ras??? mice, n = 8) (a) and colon 
(LSL-K-ras@?’* controls, n = 5; K-ras@??? mice, n = 7) (b). c, The relative 
abundance of Escherichia/Shigella spp. and Helicobacter spp. in the small 
intestine of K-ras“? mice on the ND or the HED. 
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Extended Data Figure 4 | Systemic deletion of Myd88 prevents tumour 
progression in K-ras@!?""" mice. a, Histological scores for the small intestine 
showed complete lack of tumour progression in HFD-fed K-ras@’7?"" mice 
with systemic Myd88 deletion. However, tissue-specific deletion of Myd88 did 
not confer any protection against tumour progression in K-ras@??"" mice 
(Myd88-'~ LSL-K-ras°!?"”* controls, n = 19; Myd88-'~ K-ras@!??""' mice, 
n= 15; Myd88"" LSL-K-ras@?”~ controls, n = 5; Myd88""© K-ras@?>™ 
mice, n = 7; K-ras@!2Ditt* WT BM sic, 1 = 7; K-ras@i2Dint + Myd88—/—BM ice 
n = 8). Each point represents one animal, and the lines indicate means. 

A two-sided Fisher test was applied. Adjustments on pairwise t-tests were made 
using the single-step method. ***, P< 0.001; NS, not significant. Myd88'"° 
K-ras?" K-rasC?P™ mice with IEC-specific deletion of Myd88; 
K-ras@2Dintt WT BM. Kk pgsGl2Dint mice transplanted with wild-type bone 


as ge oe, ; 
marrow; K-ras@12Din'+ Myd88—/— BM. pg sG12Dint mice transplanted with 


Myd88-deficient bone marrow. b, LEfSe results showed bacteria that were 
significantly different in abundance among HED-fed K-ras©/??"* mice with 
(n = 8) or without (n = 4) systemic Myd88 deletion. Peptostreptococcaceae, 
Deferribacteraceae and Ruminococcaceae in the small intestine, as well as 
Peptostreptococcaceae and Deferribacteraceae in the colon, became abundant 
after Myd88 deficiency (K-ras@/??""" mice, n = 7; Myd88 '~ K-ras@!??* mice, 
n= 4). c, Flow cytometric analysis of LP cells indicated that the decreased 
recruitment of, and surface antigen presentation by, CD11c¢* DCs following the 
HED was partially attenuated after Myd88 deletion (Myd88'~ LSL-K- 
ras?>’* controls, n = 8; Myd88 / ~ K-ras@??™ mice, n = 4; LSL-K-ras@7/* 
controls, n = 2; K-ras??? mice, n = 2). P values were determined by one-way 
ANOVA and adjusted for the number of comparisons by using the Bonferroni 
method. The error bars indicate s.e.m. *, P< 0.05; **, P< 0.01. 
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Extended Data Figure 5 | Bacterial composition differs between Myd88 deletion in the IECs (a) or the haematopoietic cells (b, c) (Mydss'2° 
K-ras@17?™ mice with tissue- specific and systemic deletion of Myd88. LEfSe K-ras@?bimt mice, n = 4; K-ras@12Pin'+ WT BM mice, n = 4; 
results showed that the composition of the small intestinal microbiota was K-rasC12Dint+ MyD88-I-BM ice, 4 = 4; K-ras77?™ mice, n = 8). 
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Extended Data Figure 6 | An HFD decreases SCFA concentrations. a, The 
HED decreased the acetate, butyrate and propionate concentrations in 

stool samples from K-ras@!7P'" mice and littermate controls, whereas the 
isovaleric acid and valeric acid concentrations increased (ND, LSL-K-rase7/+ 
controls n = 6, K-ras@17?™ mice n = 8; HED, LSL-K-ras@?>’* controls n = 7, 
K-ras@?P™ mice n = 11). P-values were determined by one-way ANOVA and 
adjusted for the number of comparisons by using the Bonferroni method. 
The error bars indicate s.e.m. *, P= 0.05; **, P= 0.01; ***, P= 0.001. b, SCFA 
concentrations of small intestinal samples (LSL-K-ras¢?? ’* controls, n = 4; 
K-ras@??™ mice, n = 4) and colonic samples (LSL-K-ras71? ’* controls, 

n= 3; K-ras@?>™ mice, n = 5) from K-ras@!?>™ and littermate controls on 
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the HFD supplemented with arabinogalactan. P values were determined by 
one-way ANOVA followed by Bonferroni’s multiple comparison test. The 
error bars indicate s.e.m. *, P= 0.05; **, P=0.01; ***, P=0.001. c, Flow 
cytometric analysis of cells from the LP and the MLNs revealed compromised 
DC recruitment and decreased surface antigen presentation in mice fed the 
HED supplemented with GOS (HFD, LSL-K-ras??’* controls n = 2, 
K-ras@!7?'"' mice n = 2; HED + GOS, LSL-K-ras@?"”* controls n = 5, 
K-ras???" mice n = 3). P values were determined by one-way ANOVA and 
adjusted for the number of comparisons by using the Bonferroni method. 
The error bars indicate s.e.m. The differences are not significant. 
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Extended Data Figure 7 | Prebiotic supplementation does not protect 
K-ras¢7P™ mice against HFD-induced tumorigenesis. a, Relative mRNA 
expression levels for genes involved in the immune response and those that 
encode mucins and differentiation markers for Paneth, enteroendocrine and 
epithelial cells in duodenal samples (HFD + GOS, LSL-K-ras?”””* controls 
n=5, K-ras@!?>™ mice n = 6; HED + antibiotics (Abx), LSL-K-ras??’* 
controls n = 5, K-ras@!7?"" mice n = 6). P-values were determined by one-way 
ANOVA and adjusted for the number of comparisons by using the Bonferroni 
method. The error bars indicate s.e.m. *, P< 0.05. *, P= 0.05, **, P<0.01 
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and ***, P<0.001 compared with littermate controls. b, Prebiotic 
supplementation had little or no effect on stool SCFA concentrations (HFD, 
LSL-K-ras¢?”* controls n = 7, K-ras??? mice n = 11; HED + GOS, LSL- 
K-ras@!?"’* controls n = 5, K-ras@!7?™ mice n = 6). P values were determined 
by one-way ANOVA and adjusted for the number of comparisons by using 
the Bonferroni method. The error bars indicate s.e.m. *, P= 0.05, 

*** P<0,001. **, P<0.01 and ***, P< 0.001 compared with littermate 
controls. 
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Extended Data Figure 8 | Butyrate attenuates K-Ras signalling and has 
systemic effects on metabolic parameters. a, Sodium butyrate treatment only 
slightly affected butyrate and propionate concentrations in the stool compared 
with prebiotic supplementation (HED, LSL-K-ras°!??”* controls n = 7, 
K-ras@???™ mice n= 11; HED + butyrate, LSL-K-ras??”* controls n = 6, 
K-ras@??P™ mice n = 5). P values were determined by one-way ANOVA and 
adjusted for the number of comparisons by using the Bonferroni method. 
The error bars indicate s.e.m. *, P< 0.05; **, P< 0.01; ***, P< 0.001. *, 
P<0.05, **, P<0.01 and ***, P<0.001 compared with littermate controls. 
b, The higher proliferation levels observed in the duodenum of K-ras@?P int 
mice were decreased following butyrate supplementation. c, The expression of 
differentiation markers and mucins in the duodenum (HED, LSL-K-ras¢?>/* 
controls n = 3, K-ras@??™ mice n = 4; HED + butyrate, LSL-K-ras¢??/* 


G12Dint 


controls n = 3, K-ras mice n = 3). P values were determined by one-way 
ANOVA and adjusted for the number of comparisons by using the Bonferroni 
method. The error bars indicate s.e.m. *, P< 0.05; **, P<0.01.*, P=0.05 
and **, P<0.01 compared with littermate controls. d, Butyrate had systemic 
effects and protected K-ras°!7?"* mice and littermate controls against HFD- 
induced hyperinsulinaemia (n = 5 per group). Data were assessed by t-test, 
and P values were adjusted for multiple testing. The error bars indicate s.e.m. 
*, P=0.05; **, P=0.01; ***, P=0.001. e, Butyrate treatment did not affect 
H3 or H4 acetylation (n = 3-8). The data were assessed by one-way ANOVA, 
and P values were adjusted for the number of comparisons by using the 
Bonferroni method. The error bars indicate s.e.m. The differences are not 
significant. 
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Extended Data Figure 9 | HFD-induced dysbiosis and the associated cancer 
risk can be transferred to K-ras@/7?™ mice on a normal diet. a, Following 1 
week of antibiotic cocktail treatment, K-ras@?7?™ mice and littermate controls 
(approximately 7 weeks of age) on the ND regimen were gavaged three 

times a week with fresh stool pellets from HFD-fed mutants (which had 
been HFD fed for 24 weeks on the day of first transfer) for a total of 15 weeks. 
Haematoxylin and eosin staining of duodenal samples from three gavaged 
K-ras¢??" mice show mSA-LGD and mSA-HGD, as well as invasive 
carcinoma development. b, LEfSe results showed bacteria that were 
significantly different in abundance between ND-fed K-ras?’7” mice that 
had been gavaged with stool samples from HFD-fed K-ras@?™ donors and 
ND-fed K-ras?’?"" mice that had not been gavaged. Helicobacteraceae, 
Enterococcaceae and Deferribacteraceae became more abundant in the colon 
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4 0 15 30 60 90 120 
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after stool transfer (K-ras@!7?"" mice + ND, n = 3; K-ras@?? mice + ND + 
stool, n = 6). ¢, Flow cytometric analysis of cells from the PP and MLNs 
showed antigen presentation by CD11c* DCs (ND, LSL-K-ras@’?""* controls 
n= 2, K-ras@???™ mice n = 2; ND + stool, LSL-K-ras@?”’* controls n = 7, 
K-ras??P™ mice n = 10). P values were determined by one-way ANOVA 
and adjusted for the number of comparisons by using the Bonferroni method. 
The error bars indicate s.e.m. ***, P= 0.001. *, P=0.05 and **, P=0.01 
compared with littermate controls. d, Glucose clearance during a GTT in 
ND-fed K-ras@/??"* mice and littermate controls (n = 5 per group) that had 
received stool samples from HED-fed K-ras°’?”"" donors. The results were 
analysed by t-test. P values were adjusted for multiple testing. The error bars 
indicate s.e.m. The differences are not significant. 
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Extended Data Figure 10 | Antibiotic treatment blocks HFD-induced 
tumorigenesis in K-ras@!7?™ mice. a, The expression profiles of selected 
genes involved in antigen recognition, immune response, immune cell 
recruitment, differentiation markers and mucins in duodenal samples from 
LSL-K-ras@?”* and K-ras? littermate animals (ND, LSL-K-ras@?* 
controls n = 3, K-ras??? mice n = 3; ND + stool, LSL-K-ras@’?”’* controls 
n = 3, K-ras*'7? mice n = 5). P-values were determined by one-way ANOVA 
and adjusted for the number of comparisons with the Bonferroni method. The 
error bars indicate s.e.m. ***, P= 0.001. *, P= 0.05, **, P=0.01 and **, 


P=0.001 compared with littermate controls. b, Fluorescence-activated cell 


Paneth cell 


sorting (FACS) analysis of MLN cells indicates recruitment of, and antigen 
presentation by, CD1 1c’ dendritic cells following treatment with antibiotics 
(HED, LSL-K-ras¢?””* controls n = 2, K-ras??? mice n = 2; HED + Abx, 
LSL-K-ras¢’* controls n = 5, K-ras??? mice n = 7). P values were 
determined by one-way ANOVA and adjusted for the number of comparisons 
with the Bonferroni method. The error bars indicate s.e.m. ***, P= 0.001. The 
results are not significant. c, The mechanistic scheme suggests that HFD- 
induced changes in the bacterial community, SCFA levels and mucin 

profiles cooperate with an oncogene-associated decrease in host immunity, 
collectively enhancing carcinogenesis in the small intestine. 
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Contrasting roles of histone 3 lysine 27 demethylases 
in acute lymphoblastic leukaemia 
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T-cell acute lymphoblastic leukaemia (T-ALL) is a haematological 
malignancy with a dismal overall prognosis, including a relapse rate 
of up to 25%, mainly because of the lack of non-cytotoxic targeted 
therapy options. Drugs that target the function of key epigenetic fac- 
tors have been approved in the context of haematopoietic disorders’, 
and mutations that affect chromatin modulators in a variety of leu- 
kaemias have recently been identified”*; however, ‘epigenetic drugs 
are not currently used for T-ALL treatment. Recently, we described that 
the polycomb repressive complex 2 (PRC2) has a tumour-suppressor 
role in T-ALL’. Here we delineated the role of the histone 3 lysine 27 
(H3K27) demethylases JMJD3 and UTX in T-ALL. We show that 
JMJD3 is essential for the initiation and maintenance of T-ALL, as 
it controls important oncogenic gene targets by modulating H3K27 
methylation. By contrast, we found that UTX functions as a tumour 
suppressor and is frequently genetically inactivated in T-ALL. More- 
over, we demonstrated that the small molecule inhibitor GSKJ4 (ref. 5) 
affects T-ALL growth, by targeting JMJD3 activity. These findings 
show that two proteins with a similar enzymatic function can have 
opposing roles in the context of the same disease, paving the way for 
treating haematopoietic malignancies with a new category of epi- 
genetic inhibitors. 

In recent studies, we and other researchers have revealed that PRC2 
has a key tumour-suppressor function, catalysing the methylation of 
H3K27 (refs 2, 4, 6). Since net H3K27me3 levels are dictated by the 
balance between histone methylation and active histone demethylation, 
we hypothesized that the removal of methyl groups from H3K27 is also 
an important process in T-ALL progression. We therefore investigated 
the possible roles of H3K27 demethylases in T-ALL (see Supplemen- 
tary Notes for an extended introduction). Ubiquitously transcribed tet- 
ratricopeptide repeat X-linked protein (UTX)”* (also known as KDM6A) 
is a ubiquitously expressed protein that controls the basal levels of H3 
K27me3 and the induction of ectoderm and mesoderm differentiation””® 
and is essential for somatic cell reprogramming’. Jumonji D3 (JMJD3 
(also known as KDM6B) is induced upon inflammation” or exposure to 
viral and oncogenic stimuli'*”, and it controls neuronal and epidermal 
differentiation’*’° and inhibits reprogramming’. UTX is a tumour sup- 
pressor in several solid tumours*'* *’. However, the roles of these two 
demethylases as direct modulators of the oncogenic state are largely 
uncharacterized’? 


ys 


We have generated and studied NOTCH1-induced T-ALL animal 
models* (Fig. 1a), because activating mutations of NOTCH] area defin- 
ing feature of T-ALL”. Jmjd3 messenger RNA and protein expression 
levels were significantly higher in leukaemic cells than in untransformed 
CD4*CD8* (double positive) control T cells, which exhibit low levels 
of active NOTCH1, whereas Utx (and Ezh2)* expression did not change 
significantly (Fig. 1b, cand Supplementary Table 1) upon transforma- 
tion. It has previously been shown that the transcription factor nuclear 
factor-«B (NF-«B) controls JMJD3 expression during inflammation” 
and that NOTCH] induces the NF-«B pathway in T-ALL”. Here we 
showed increased expression of the p65 subunit (also known as RELA) 
of NF-«B and its—but not NOTCH1—binding to Jmjd3 control ele- 
ments in mouse T-ALL cells (Extended Data Fig. 1a, b). Modulation of 
the levels of intracellular NOTCH or the activity of the NF-«B path- 
way significantly decreased the amount of NF-«B bound to the Jmjd3 
elements, as well as Jmjd3 mRNA expression (Extended Data Fig. 1b-f). 
We then probed for JMJD3 binding to specific oncogenic loci, which 
has previously been shown to be important in T-ALL*. We found that 
JMJD3 binding was highly enriched on the Hes1 promoter (Fig. 1d, left), 
and this binding depended on the activation of the NOTCH1 pathway 
and negatively correlated with the H3K27me3 levels (Extended Data 
Fig. 1g, h). 

Analyses of human leukaemia cases****° have shown that JMJD3 is 
more highly expressed in T-ALL cells than in normal T-cell progenitors” 
or in other types of leukaemia”*”’, which is similar to the expression of 
the classic NOTCH] target HES] (Fig. le). Genes that are co-expressed 
with JMJD3 in human primary samples were found to exhibit loss of 
H3K27me3 during leukaemia progression (Extended Data Fig. li), sug- 
gesting a connection between the expression of JMJD3 and the H3K27 
me3 levels on specific targets. 

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) 
studies in human T-ALL cells (the cell line CUTTL1) showed that JMJD3 
was bound to important NOTCH1 targets with oncogenic functions 
(such as HEY1, NRARP and HES!) (Fig. 1f). There was a significant 
co-occupancy of JMJD3 with NOTCH] (ref. 27) (33% of the top JMJD3 
peaks were occupied by NOTCHI, a 6.9-fold enrichment over control, 
P<1%X 10 °), the NOTCHI1 partner RBP-J« and the activating mark 
H3K4me3 (ref. 27) (Extended Data Fig. 1j). The majority of JMJD3 
binding sites were localized around the transcription start sites (TSSs) of 
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Figure 1 | JMJD3 is highly expressed in T-ALL and controls the expression 
of important oncogenic targets. a, Size comparison of the spleens (left) and 
haematoxylin and eosin staining of the liver (centre) of healthy (WT, top) 
and leukaemic (T-ALL, bottom) mice. The arrows denote leukaemic 
infiltration of the liver of the T-ALL mouse. Scale bar, 50 jtm. Representative 
samples from n = 3 mice are shown. b, c, Protein (b) and transcript (c) levels of 
the demethylases JMJD3 and UTX in control T cells (CD4*CD8* (double 
positive) thymocytes) and mouse T-ALL cells. Representative samples (b) or 
the mean + s.d. (c) of three mice is shown; values were normalized according to 
the sample with the highest expression value. d, ChIP for JMJD3 on the 

Hes1 promoter in control T cells and mouse T-ALL cells (left) and upon 
y-secretase inhibitor (ySI) treatment of T-ALL cells (right) (n = 3): data are 
shown as mean + s.d. DMSO, dimethylsulphoxide. e, Expression analysis of 
JMJD3 and HES1 among samples of acute T-cell leukaemia (T-ALL; 

83 samples), acute B-cell leukaemia (B-ALL; 23) and acute myeloid leukaemia 
(AML; 537), as well as physiological T-cell subsets (24)** (quantile 
normalization across samples, see Methods). The data are shown as 

mean = s.d. The P values (Wilcoxon test) are as follows: for JMJD3, T-ALL 
versus physiological T cells, 4.0 x 10°; T-ALL versus AML, 1.1 X 10713; 
T-ALL versus B-ALL, 2.2 X 10°; and for HES1, T-ALL versus physiological 
T cells, 3.7 X 10 *; T-ALL versus AML, 3.5 X 10“; T-ALL versus B-ALL, 
1.3X 10°, ***, significant. f, Snapshots of JMJD3 binding in human T-ALL. 
Three NOTCH targets and the interferon-B (IFNB) gene (negative control) 
are shown. Chr, chromosome. 


genes (Extended Data Fig. 1k) ina fashion similar to NOTCH] binding 
sites’’. These results suggest a key role for JMJD3 in oncogenic programs 
in T-ALL, through interaction with NOTCH1. Protein immunopreci- 
pitation studies in 293T cells (human embryonic kidney cells), as well 
as in mouse T-ALL cell lines, showed that JMJD3 is part of the NOTCH1 
transcriptional complex, as it interacts directly with NOTCH1 and 
MAMLI (Extended Data Fig. 2a—c). By contrast, there was no NOTCH1 
interaction with EZH2 or UTX. As JMJD3 has been shown to be a 
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Figure 2 | Dissecting the oncogenic role of JMJD3 in T-ALL. a, b, The 
protein levels of JMJD3 (a) and UTX (b) in human T-ALL cells (CUTLL1) 
expressing the corresponding shRNAs against the two demethylases. 
Representative blots from three independent studies (biological replicates) are 
shown. ¢, Effects on human T-ALL cell proliferation of shRNA treatment, as 
measured by loss of green fluorescent protein (GFP)-expressing shRNA. For 
all cell lines, the mean + s.d. from three representative studies is shown. 

d, Differential expression analysis upon knockdown of JMJD3 in T-ALL (top). 
The loci of the downregulated genes exhibit an increase in H3K27me3 (red 
dots, bottom), whereas the upregulated genes exhibit a decrease in H3K27me3. 
The data shown are representative of three independent studies. FPKM, 
fragments per kilobase of transcript per million fragments mapped. e, In vivo 
growth of P12 T-ALL cells in intravenous xenograft studies upon genomic 
ablation of JMJD3 (left) and with a Renilla control (centre). One million P12 
cells were injected into each of seven animals. Sublethally irradiated NRG 
(immunocompromised) mice were used as recipients, and transplanted 
leukaemic cell growth was compared with the baseline (day 0). Day 0 was set as 
the first day when substantially detectable luciferase intensity was measured. 
The last day of the experiment was the day that either the luciferase intensity 
reached saturation or the mice were killed for humane reasons. 

Horizontal bars, means. 


member of MLL complexes’”, we tested whether JMJD3 interacted with 
WDRS, a key subunit of the MLL complex. We found that JMJD3 in- 
teracted with WDRS (Extended Data Fig. 2b), suggesting a potential 
NOTCH1-JMJD3-MLL complex on target promoters. 

To clarify the role of JMJD3 and UTX in the maintenance of leuk- 
aemia, we performed genomic knockdown of JMJD3 in human T-ALL 
cells using two short hairpin RNAs (shRNAs) (Fig. 2a, b and Extended 
Data Fig. 2d). Treatment with shJ MJD3 but not shUTX affected the via- 
bility of leukaemic cells, as shown in loss of representation studies and 
apoptosis assays, and this finding is in contrast to the viability of mye- 
loid leukaemia lines used as controls (Fig. 2c and Extended Data Fig. 2e, f). 
The expression of NOTCH1 targets was negatively affected by shJMJD3, 
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Figure 3 | The demethylase UTX acts as a tumour suppressor in T-ALL. 
a-c, Monitoring the initiation and progression of T-cell leukaemia in a 
NOTCH 1-overexpressing model of T-ALL. Leukaemic blasts (expressed as 
NOTCH1-IC-GFP-positive cells) in the peripheral blood (a, mean + s.d.) and 
in a blood smear (b) and leukaemic cell infiltration of the liver (c) of male 
wild-type (Utx*!”, n = 10) and knockout (Utx /", n = 6) mice are shown. 
NOTCHL-IC, intracellular part of NOTCHI1. d, Survival studies of mice 
transplanted with haematopoietic progenitors from the wild-type (Utx'/”, 
n= 10) and knockout (Utx /”, n = 6) backgrounds expressing NOTCH1-IC. 
e, Scatter plot summarizing the major genome-wide expression differences 
between T-ALL tumours of the wild-type (Utx*'”) and knockout (Utx/”) 


and this was accompanied by loss of JMJD3 and gain of H3K27me3 on 
their promoters (Extended Data Fig. 3a-e). Genome-wide expression 
analysis showed that more transcripts were significantly downregulated 
by shJMJD3 treatment than were upregulated (749 protein-coding genes 
versus 297; Fig. 2d, top, and Extended Data Fig. 3f), in agreement with 
the role of JMJD3 as a transcriptional activator. The downregulated genes 
were found to be significantly enriched in genes that gained H3K27me3 
on their promoters (Fig. 2d, bottom; P = 1.02 x 10’). The shUTX- 
downregulated and shUTX-upregulated gene signatures were reversed 
in terms of the gene numbers (46 downregulated and 189 upregulated 
protein-coding genes, compared with both shRenilla (control) and 
shJMJD3). Intriguingly, JMJD3 expression itself was significantly upre- 
gulated upon UTX silencing (Extended Data Fig. 3a). Well-characterized 
NOTCH targets, as well as genes in the NF-kB pathway were down- 
regulated as part of the JMJD3 signature (Fig. 2d, top, and Extended 
Data Fig. 3g). These findings were confirmed using additional human 
T-ALL cell lines with high levels of oncogenic NOTCH] activity”? (Ex- 
tended Data Fig. 3h, i). Subcutaneous or intravenous xenograft models 
of T-ALL cell lines (CUTLL1, CEM and P12) treated with either of the 
two shRNAs against JMJD3 (shJMJD3A and shJMJD3B) and trans- 
planted into immunocompromised mice (NRG mice; NOD Rag] /~ 
Tl2rg-'~) showed a significant growth disadvantage compared with 
shRenilla-treated cell lines (Fig. 2e and Extended Data Fig. 4a—f). In- 
terestingly, silencing of UTX led to enhanced proliferation in many cases, 
suggesting a possible tumour-suppressor function in vivo (Extended 
Data Fig. 4g). 


backgrounds. RNA sequencing was performed using three pairs of wild-type 
and Utx knockout (KO) NOTCH1-IC tumours (spleen and bone marrow). 

f, g, Analysis of genetic status of the UTX (KDM6A) locus in paediatric T cell 
leukaemia (n = 107). Affymetrix SNP6.0 microarrays (f) for assessing genomic 
deletions. Illustration of the human UTX protein (g) depicting three frameshift 
(fs) mutations in paediatric T-ALL (grey circles), as well as one in-frame 
deletion (p.Alal4_Ala17del), one splice acceptor site mutation (exon (ex) 4 
splice) and one missense mutation (p.Ile598Val) in adult T-ALL (white circles), 
as identified by targeted Sanger sequencing. The jumonji domain (JmjC) 

and the tetratricopeptide repeats are shown. SJTALL, St. Jude Children’s 
Research Hospital sample depository of T-ALL samples. 


To examine the potential roles of UTX and JMJD3 in the induction 
of T-ALL, we performed bone marrow transplantation experiments 
using haematopoietic stem cells from Utx and Jmjd3 germline knock- 
out mice. Although female Utx”'~ mice die at E9.5 because of defects 
in mesoderm development, a small fraction of male Utx 7!” mice sur- 
vive to adulthood as a result of compensation by UTY”*. Despite T-cell 
development being largely unaffected (Extended Data Fig. 5a, b), T-ALL 
kinetics were significantly faster on the Utx /” background, as deter- 
mined by leukaemic burden quantification in the peripheral blood and 
infiltration of the spleen (data not shown) and liver (Fig. 3a—c and Ex- 
tended Data Fig. 5c-e). Moreover, mice succumbed to the disease with 
a significantly shorter latency in the absence of Utx than in the Utx*/” 
and Utx*’* genotypes (Fig. 3d and Extended Data Fig. 5f-h). These 
experiments provide the first in vivo analysis of the tumour-suppressor 
role of UTX in any tumour type. 

To delineate the potential mechanism underlying UTX action, we 
analysed the gene expression of sorted leukaemic blasts from the spleen 
or bone marrow of wild-type ( Utx*!"/Utx*!*) or knockout (Utx'") mice 
(Fig. 3e). This analysis showed that UTX positively controls important 
tumour-suppressor genes, such as retinoblastoma binding protein 6 
(Rbbp6), the inhibitor of NOTCH1 pathway activity Fbxw7 and the 
PRC2 member Suz12; by contrast, genes with an oncogenic role in T- 
ALL, including Jmjd3, were upregulated (Fig. 3e and Extended Data 
Fig. 5i). These studies strongly suggested that UTX might act as a tumour 
suppressor in human T-ALL. We thus screened a panel of primary pae- 
diatric T-ALL samples’ for genetic alterations of the UTX locus. Analysis 
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Figure 4 | Pharmacological targeting of T-ALL through specific inhibition 
of the demethylase activity of JMJD3. a, Dose-dependent effect of the 
inhibitor GSKJ4 (normalized to a control inhibitor, GSKJ5)° on CUTLL] cell 
proliferation. The data are shown as mean <= s.d. b, Effect of GSKJ4 (at 2 uM) 
on CUTLL1 T-ALL cells. The data are shown as mean = s.d. c, Heatmap 
representation of GSKJ4-associated changes in gene expression (left, three 
biological replicates) and H3K27me3 changes (right) of 486 significantly 
downregulated coding transcripts in CUTLL1 T-ALL cells over a period of 72 h. 
Centre, Occupancy by JMJD3 and NOTCH1 and H3K4me3 marks in 
respective 4-kilobase (kb)-flanked TSS regions (indicated in different colours 
from each other). d, Comparison of the shJMJD3 and shUTX expression-based 
signatures with the GSKJ4-induced expression changes. Note the highly 
significant overlap between the genes downregulated by shJMJD3 and GSKJ4. 
Different colours represent different levels of significance: red, high 
significance; orange, medium significance; and yellow, low significance. e, Box 
plots representing mean + s.d. GSKJ4-induced H3K27me3 changes in JMJD3 
target genes, as well as in the commonly downregulated genes in shJMJD3- 
and GSkjJ4-treated cells. Genes upregulated by both shJMJD3 and GSKJ4 
treatments were used as the negative controls. The P values are as follows: 
JMJD3 targets versus GSKJ4_up and shJMJD3_up, 2.7 X 10°; and 
GSKJ4_down and shJMJD3_down versus GSKJ4_up and shJMJD3_up, 

4.4X 10719, #**, significant. 


of primary human samples of paediatric T-ALL using single nucleotide 
polymorphism (SNP) arrays identified two patients with focal deletions 
of the UTX locus (Fig. 3f). Further targeted sequencing in paediatric 
and adult T-ALL led to the identification of six more patient cases with 
UTX mutations (Fig. 3g, Extended Data Fig. 5j, k and Supplementary 
Table 2), including in-frame deletions, missense (Ile598 Val) mutations 
and frameshift alterations. Analysis of bone marrow remission genomic 
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DNA confirmed the somatic origin of the UTX splice site mutation (Ex- 
tended Data Fig. 5k). Seven out of the eight alterations belonged to male 
patients, further underlining that the roles of UTX and UTY do not 
seem to be interchangeable. These genetic alterations are predicted to 
have an inactivating role**”' and provide further evidence that UTX is 
a tumour suppressor in T-ALL. Indeed, overexpression of UTX using 
a doxycycline-inducible lentiviral system in T-ALL cell lines (Extended 
Data Fig. 51) led to suppression of tumour growth and a significant in- 
crease in apoptosis (Extended Data Fig. 5m, n). 

Ji mjd3 ' ~ mice” lack the catalytic domain of the JMJD3 protein (Ex- 
tended Data Fig. 6a, b) and die perinatally””. Haematopoiesis and T-cell 
development were largely unaffected by the absence of JMJD3 (Extended 
Data Fig. 6c—h). Genetic ablation of Jmjd3 in T-ALL led to fewer leuk- 
aemic blasts in the peripheral blood, significantly reduced leukaemic 
cell infiltration of the spleen and liver and improved survival rates in the 
recipients (Extended Data Fig. 7a-f), consistent with Jmjd3 having an 
oncogenic role. These striking phenotypes supported our previous 
in vitro and in vivo findings and led us to further explore the therapeutic 
potential of targeting JMJD3 activity in T-ALL. 

We next tested whether the small molecule GSKJ4 (ref. 5), which is 
directed against JMJD3 and UTX (half-maximum inhibitory concen- 
tration (IC59) as determined by matrix-assisted laser desorption mass 
spectroscopy, JMJD3, 18 uM; UTX 56 uM; ref. 5), affects maintenance 
of the disease. We used GSKJ4 at the IC; determined for T-ALL cells 
(2 uM) (Fig. 4a) to treat a panel of T-ALL cell lines. GSKJ4 significantly 
affected the growth of human T-ALL cell lines and primary human T- 
ALL cells (T-ALL1-3), leading to cell cycle arrest and increased apoptosis 
compared with control-inhibitor-treated cells (Fig. 4b and Extended 
Data Fig. 8a—h). The first detectable changes started at 24 h, and we ob- 
served significantly altered phenotypes at 48 h and 72 h (Extended Data 
Fig. 8i). These GSKJ4 effects appear to be connected to the demethylase 
activity of JMJD3, as overexpression of catalytically inactive JMJD3 
did not rescue the phenotype (Extended Data Fig. 8j, k). The growth of 
myeloid leukaemia cells, stromal cells and haematopoietic progenitor 
cells (Extended Data Fig. 81, m) was unaffected by GSKJ4, demonstrat- 
ing specificity of function. Mechanistically, we detected gene expression 
changes starting at 24 h post-GSKJ4 treatment, and significant changes 
were noted at 48 h and 72 h (Extended Data Fig. 8n) and were coupled 
to an increase in the H3K27me3 levels at repressed genes (Extended Data 
Fig. 9a-c). The NOTCH1 and JMJD3 occupancy at specific NOTCH1 
target genes that were tested, as well as the total cellular levels of NOTCH1 
and JMJD3 and the chromatin H3K27me3 levels, did not significantly 
change over the treatment duration (Extended Data Fig. 9a-e). 

Genome-wide studies identified 486 downregulated genes after 72h 
of treatment of human T-ALL cells (CUTLL1) with GSKJ4 (Fig. 4c). 
There was a significant overlap between the shJMJD3 and GSKjJ4 sig- 
natures for both downregulated genes (P = 4.88 X 10° “*; Fig. 4d and 
Supplementary Table 3) and upregulated genes (P = 2.57 X 107°). By 
contrast, the shUTX-upregulated gene signature significantly overlapped 
with the GSKJ4-downregulated gene signature. Furthermore, there was 
a significant overlap between genes upregulated in Utx knockout blasts 
and downregulated by GSKJ4 treatment (P = 2.49 X 10”; Figs 3e and 
4d and Supplementary Table 3), suggesting again that UTX and JMJD3 
play opposing roles in T-ALL. Genome-wide study of H3K27me3 local- 
ization demonstrated that the GSKJ4-downregulated genes experienced 
gain of H3K27me3 upon GSKJ4 treatment and were marked by the pres- 
ence of H3K4me3, NOTCH] and JMJD3 at their promoters (Fig. 4c and 
Extended Data Fig. 1j). Well-characterized NOTCH] and JMDJ3 targets 
are highlighted as representative examples of the GSKJ4-downregulated/ 
shJMJD3-downregulated signature and show a significant gain in H3 
K27me3 upon GSKjJ4 treatment (Fig. 4e and Extended Data Fig. 9f). 
UTX was not involved in the regulation of the oncogenic NOTCH tar- 
gets, as revealed by ChIP studies (Extended Data Fig. 9g). 

We propose targeting JMJD3 as a novel therapeutic option for pae- 
diatric and adult T-ALL. This proposal is based on recent studies**® 
that demonstrate that H3K27me3 catalysed by the PRC2 complex plays 
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a key role in T-ALL, through antagonism with oncogenic NOTCH. 
We demonstrate here that NOTCH1-mediated recruitment of JMJD3 
to promoters can explain this antagonism (Extended Data Fig. 10 and 
see also Supplementary Discussion for extended discussion). We pro- 
pose that NOTCHI recruitment leads to PRC2 eviction as a result of 
the active demethylation of H3K27 through the catalytic activity of 
JMJD3 and the recruitment of JMJD3 to target promoters. By contrast, 
the reported increases in the levels of the activating H3K4me3 mark on 
a large fraction of NOTCH] target genes”’ (Fig. 4c) can be explained 
by the fact that NOTCH1 has the ability to participate in MLL com- 
plexes (Extended Data Figs 2 and 10). Moreover, we demonstrate the 
anti-tumorigenic activities of the inhibitor GSKJ4 (ref. 5) and its spe- 
cificity for T-ALL cells. Clearly, we cannot exclude the possibility that 
GSKjJ4 affects other important epigenetic modulators or signalling path- 
ways. Nevertheless, we consider that the main action of this inhibitor 
in T-ALL is channelled through the inhibition of JMJD3 activity and 
propose that such compounds should be tested either as single drugs 
or in combination with standard chemotherapy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Mice, cell culture and primary cell samples. The Jimjd3 (ref. 29) and Utx’* knock- 
out mouse models, as well as the corresponding genotyping strategy, have been 
described previously. All animals used in this study were treated according to 
IACUC protocols for the laboratories of I.A., A.A.F. and R.J. The human T-ALL 
cell lines CUTLLI (ref. 30), P12-Ichikawa, Loucy, DND41, CEM and Jurkat and 
the myeloid leukaemia cell lines (THP-1 and HL-60), as well as the mouse T-ALL 
line (720)”, were cultured in RPMI 1640 medium supplemented with 20% FBS and 
penicillin and streptomycin. All cell lines were tested for the presence of mycoplas- 
ma, and only mycoplasma-free lines were used for these studies. Primary human 
samples were collected by collaborating institutions with informed consent and were 
analysed under the supervision of the Columbia University Medical Center and St. 
Jude Children’s Research Hospital Institutional Review Boards. The primary cells 
treated with GSKJ4 inhibitor (for more information on these cells, see ref. 32) were 
cultured in MEMa medium plus 10% FBS (StemCell Technologies, #06400), 10% 
human AB* serum (Invitrogen), 1% penicillin/streptomycin, 1% GlutaMAX, human 
interleukin-7 (IL-7) (R&D Systems; 10 ng ml), human Fit3 ligand (Peprotech; 
20 ng ml ~ ') human SCE (Peprotech; 50 ng ml~ ') and insulin (Sigma; 20 nmol ]™ 1), 
Irradiated MSS stromal cells overexpressing delta-like 1 (DLL1) were used as a 
feeder layer, as previously described*’. 

In vitro drug treatment and shRNA treatment and cell growth, apoptosis and 
cell cycle analysis. T-ALL cells were infected twice with shRNA-expressing retro- 
viruses and selected using puromycin. Expression studies took place at different 
time points during the selection period, and we present the results from day 4 dur- 
ing selection. To calculate the ICsy of GSKJ4 (GlaxoSmithKline)’ normalized to 
the control inhibitor GSKJ5 (GlaxoSmithKline), T-ALL lines were treated with dif- 
ferent concentrations of the drug for 5 days. For cell growth, cell lines and primary 
cultures were treated with 2 14M GSKJ4 and GSKJ5 for various times (24h to 72 h) 
and stained with annexin V and subjected to cell cycle analysis. y-Secretase inhib- 
itor (7SI, specifically Compound E (Alexis Biochemicals)) was used at 500 nM for 
various periods. For the cell cycle analysis, 5-bromodeoxyuridine (BrdU;10 1M) 
was added for a 1h pulse, and incorporation into DNA was determined by using 
the BrdU Flow Kit (BD Biosciences). Apoptosis was studied by quantification of 
annexin V staining using the BD Biosciences kit and flow cytometry according to 
standard protocols provided by the manufacturer. Doxycycline was used at 1 pg ml 
final concentration. 

Intravenous and subcutaneous xenograft studies. Studies were conducted as 
previously published*. In both cases, CUTLL1, P12 or CEM T-ALL cells expressing 
luciferase (FUW-LUC) and the corresponding shRNA (shJMJD3, shUTX or 
shRenilla) were used. For the intravenous studies, 1 X 10° cells were injected retro- 
orbitally into sublethally irradiated female NRG (NOD Rag1 '~ Il2rg-‘~) mice. For 
subcutaneous studies, 1 X 10° cells were mixed with an equal volume of BD Matrigel 
basement membrane and injected into the flanks of female NOD-SCID mice. In 
both cases, cell growth was monitored every 2 days using IVIS (Caliper, PerkinElmer). 
Transplantation for reconstitution of the haematopoietic system and for dis- 
ease progression analysis. Fetal livers from Jmjd3*'*, Jmjd3‘'~ and Jmjd3"'~ 
embryos (E13.5, Ly45.2 background) were provided by S.A.’s laboratory, and 1 X 
10° total (unfractionated) fetal liver cells were used for the reconstitution of the 
haematopoietic system of lethally irradiated recipients on a Ly45.1 background. 
Bone marrow was isolated from the recipients, followed by isolation of cells of the 
Ly45.2 background using flow cytometry. Total Ly45.2 bone marrow mononuc- 
lear cells (2.5 X 10° cells) were mixed with equal numbers of Ly45.1 (wild-type) bone 
marrow cells and transplanted into lethally irradiated recipients to study haema- 
topoietic reconstitution in a competitive setting. 

For the Utx*!*, Utx*/~ and Utx7/” (Ly45.2) background, 2.5 x 10° cells of total 
Ly45.2 bone marrow mononuclear cells were mixed with equal numbers of Ly45.1 
(wild-type) bone marrow cells and transplanted into lethally irradiated recipients 
to study haematopoietic reconstitution in a competitive setting similar to the Jmjd3 
study. 

In both cases, reconstitution of the haematopoietic system was monitored by 
analysis of the peripheral blood for the main haematopoietic lineages. The thymus 
and spleen of some recipients were isolated and analysed at 3 months post trans- 
plantation. 

For analysis of leukaemia progression, c-Kit* haematopoietic progenitors from 
the bone marrow of both Jmjd3 and Utx knockout models were magnetically selected 
(STEMCELL Technologies) using an antibody against CD117 (c-Kit) and were cul- 
tured overnight in the presence of 50 ng ml ' SCF, 50 ng ml Fit3 ligand, 10 ng ml“ 
IL-3 and 10 ng ml * IL-6. Overexpression of oncogenic Notch1 mutants (the intra- 
cellular part of NOTCH1 (NOTCHI1-IC) and DeltaE (NOTCH1-AE)) in bone 
marrow haematopoietic progenitors followed by transplantation into mouse reci- 
pients led to the development of T-ALL, characterized by the presence of leukaemic 
blasts in the peripheral blood that infiltrated the peripheral lymphoid organs, 
progressively leading to the death of the animals (Extended Data Fig. 5c). The cells 


were infected with NOTCH1-IC or NOTCH1-AE (and green fluorescent protein 
(GFP)) expressing retroviruses twice (24 h and 48 h post c-Kit selection). Viral trans- 
duction efficiency was determined by measuring reporter fluorescence over a total 
period of 4 days, and total populations were transferred via retro-orbital injection 
into lethally irradiated congenic recipients along with 2.5 X 10° total (wild-type) 
bone marrow mononuclear cells for haemogenic support. GFP* cells (4 10°) were 
transplanted in both NOTCH1-IC and NOTCH1-AE studies. The Mantel-Cox test 
was used for the analysis of the survival data. No randomization or blinding method 
was used during these animal studies. 
Antibodies, reagents, kits and virus production. Protein-G-coated magnetic beads 
were purchased from Invitrogen. Antibodies against the following proteins were 
used: monoclonal mouse H3K27me3 (histone H3 migrates at around 17 kDa) (Abcam, 
ab6002), monoclonal mouse H3K27mel (Active Motif, 61015), polyclonal rabbit 
H3K4me3 (Active Motif, 39159), polyclonal rabbit NOTCH (the intracellular part 
of the protein migrates at around 110 kDa), polyclonal rabbit JMJD3 (protein mi- 
grates at around 170 kDa) (Abgent, AP1022a (human) and AP1022b (mouse)), as 
well as polyclonal rabbit JMJD3 (Cell Signaling Technology, 3457), polyclonal rab- 
bit UTX (protein migrates at around 160 kDa) (Abcam, ab36938, and Bethyl Lab- 
oratories, A302-374A), polyclonal rabbit NF-«B (p65, protein migrates at around 
65 kDa) (Santa Cruz Biotechnology, sc-109 and sc-372) and control IgG (Santa Cruz 
Biotechnology, msc-2025 (mouse) and sc-2027 (rabbit)). All antibodies for flow 
cytometry were from eBioscience. All antibodies used had been tested and shown 
to be specific for the purposes we used them for by the suppliers. The acid extrac- 
tion protocol by Abcam was used for the characterization of histone mark levels 
upon GSKJ4 treatment. To generate the virus, we infected 293T cells with a plasmid 
expressing the shRNA (an miR-30-based system™) against JMJD3 or UTX (shJMJD3A, 
5'-CAGGGAAGTTTCGAGAAGTCCTATAGTGAAGCCACAGATGTATAGG 
ACTCTCGAACTTCCCTT-3’; shJMJD3B, 5'-ACACCAGCAGTAGCAACAGC 
AATAGTGAAGCCACAGATGTATTGCTGTTGCTACTGCTGGTGG-3’; shUTX, 
5'-ACACAAGGTAGTCTACAGAATATAGTGAAGCCACAGATGTATATTC 
TGTAGACTACCTTGTGG3’). We also used shRenilla (5’-CTCGAGAAGGTA 
TATTGCTGTTGACAGTGAGCGCAGGAATTATAATGCTTATCTATAGTG 
AAGCCACAGATGTATAGATAAGCATTATAATTCCTATGCCTACTGCCT 
CGGAATTC-3’) as a control and the retroviral packaging plasmid. Viral super- 
natant was collected over a period of 72 h and used for the transduction of T-ALL 
cells. The cells were infected twice and then selected with puromycin starting 3 days 
after viral infection. Reporter fluorescence was used (as determined by flow cyto- 
metry) for the quantification of shRNA. 
Histopathology. Organs were harvested from the animals and immersion fixed 
with 4% paraformaldehyde’ overnight at 4 °C. Samples were washed with PBS three 
times for 1 h at room temperature and dehydrated in 70% ethanol. Samples were 
embedded in paraffin blocks. Sections (6-1m thick) were stained with haematox- 
ylin and eosin following standard procedures. Peripheral blood smears were briefly 
fixed in methanol and stained with Wright-Giemsa solution (Fisher). Slides were 
rinsed with water, dried, mounted with Cytoseal 60 and coverslipped. 
Protein immunoprecipitation for interaction studies. For the interaction stud- 
ies between the NOTCH1 complex (NOTCH1 and MAML1) and the epigenetic 
modulators (UTX, JMJD3 and EZH2), we used standard protocols used elsewhere. 
In brief, cells were resuspended in TENT buffer (50mM Tris, pH 8.0, 5mM 
EDTA, 150 mM NaCland 0.05% (v/v) Tween-20) supplemented with the inhibitors 
at a concentration of 20 X 10° cells ml” ' buffer. Cell lysates were passed through a 
25G syringe five times and incubated on ice for 30 min, followed by centrifugation 
to remove cell debris (5 min, 13,000g). The cleared lysate was precleared with beads 
for 1h at 4°C to decrease non-specific binding and incubated overnight with the 
corresponding antibody-bound bead complexes. Five micrograms of antibody was 
used for 3 mg of extracts. 
RNA-seq library preparation and analysis. Whole RNA was extracted from 1-5 
X 10° T-ALL cells or primary cells using the RNeasy kit (QIAGEN) according to 
the manufacturer’s protocol. Poly(A)* RNA was enriched using magnetic oligo(dT)- 
containing beads (Invitrogen). cDNA was prepared and strand-specific libraries 
were constructed using the (UTP method as described previously”. Libraries were 
sequenced on the Illumina HiSeq 2000 using the 50-base-pair single-read method. 
ChIP and ChIP-seq library preparation. ChIP experiments were performed as 
described previously’. In brief, for the analysis of histone marks, we fixed the cells 
with 1% formaldehyde for 10 min at 25 °C and lysed them by the addition of nucleus 
incubation buffer (15 mM Tris, pH 7.5, 60 mM KCl, 150 mM NaCl, 15 mM MgCl, 
1mM CaCl, 250 mM sucrose and 0.3% NP-40) and incubation at 4 °C for 10 min. 
The nuclei were washed once with digest buffer (10 mM NaCl, 10 mM Tris, pH 7.5, 
3 mM MgCl and 1 mM CaCl), and we used micrococcal nuclease (USB) in digest 
buffer to generate mononucleosomal particles. The reaction was stopped by the 
addition of EDTA (20 mM). The nuclei were lysed in nucleus lysis buffer (50 mM 
Tris-HCl, pH 8.0, 10 mM EDTA, pH 8.0, and 1% SDS) followed by sonication using 
a Bioruptor (Diagenode), and chromatin was precleared by the addition of nine 
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volumes of IP dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2mM EDTA, 
pH 8.0, 16.7 mM Tris-HCl, pH 8.0, and 167 mM NaCl) and magnetic Dynabeads. 
One per cent of the chromatin was kept as input. We coupled 2.5 jg antibody with 
25 ul of beads for 4h in reaction buffer, and the complex was added to precleared 
chromatin (the equivalent of 10°-10° cells, depending on the antibody) followed 
by overnight incubation at 4 °C with rotation. We washed the complexes bound to 
the beads using buffers with increasing salt concentration: once with wash A (20 mM 
Tris-HCl, pH 8, 150 mM NaCl, 2 mM EDTA, 1% (w/v) Triton X-100 and 0.1% (w/v) 
SDS), once with wash B (20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 2mM EDTA, 
1% (w/v) Triton X-100 and 0.1% (w/v) SDS), once with wash C (10 mM Tris-HCl, 
pH 8.0, 250 mM LiCl, 1 mM EDTA, 1% (w/v) NP-40 and 1% (w/v) deoxycholic acid) 
and twice with TE, followed by treatment with RNase and proteinase K. The cross- 
links were then reversed, and the DNA was precipitated using ethanol and glycogen. 
For JMJD3 ChIP, the cells were fixed with 1% formaldehyde for 10 min at 25 °C 
and lysed on ice using 1 ml cell lysis buffer (50 mM HEPES-KOH, pH 7.5, 140 mM 
NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40 and 0.25% Triton X-100) per 1 X 
10’ cells. We resuspended the pellet in 1 ml buffer II (10 mM Tris-HCl, pH 8, 200 mM 
NaCl, 1 mM EDTA, pH 8, and 0.5 mM EGTA) per 1 x 10’ cells. We further resus- 
pended the nuclei in buffer III (10 mM Tris-HCl, pH 8, 100 mM NaCl, 1 mM EDTA, 
0.5 mM EGTA, 0.1% sodium deoxycholate and 0.5% n-lauroylsarcosine) and son- 
icated the solution with a Bioruptor for 40 min. Triton X-100 was added to a final 
concentration of 1%, and the chromatin preparation was precleared using mag- 
netic beads. The antibody (5 |g) was coupled to the magnetic beads (50 1) as in 
the case of the histone marks, and the complex was added to the precleared chro- 
matin (the equivalent of 1 x 10’ cells per reaction). The reaction mix was then in- 
cubated for 12-16 h. The beads with the immunoprecipitated chromatin fragments 
were washed eight times with RIPA buffer (50 mM HEPES-KOH, pH 7.6, 300 mM 
LiCl, 1 mM EDTA, 1% NP-40 (IGEPAL) and 0.7% sodium deoxycholate) and once 
with TE. The DNA was cleaned as in the case of the chromatin marks (see above). 
Libraries were generated as described previously’, including end repair, A-tailing, 
adaptor ligation (lumina TruSeq system) and PCR amplification of the libraries. 
AMPure XP beads (Beckman Coulter, A63880) were used for DNA cleaning in 
each step of the process. 
Sequence analysis of primary samples. Sequencing and analysis of paediatric T- 
ALL samples was conducted as described in previously published studies***. In brief, 
sequencing of UTX in the paediatric T-ALL cohort was performed by PCR of whole 
genome amplified DNA, followed by sequencing using 3730x] instruments (Applied 
Biosystems) as previously described”’. Single nucleotide variations were detected 
by SNPdetector** and PolyScan® and validated by sequencing of both tumour and 
matched non-tumour samples. A total of 107 paediatric patients were screened, 
including 64 cases with ETP ALL (25 females and 39 males) and 43 with ‘typical’ 
T-ALL (8 females and 35 males). UTX mutations were detected in 4.7% of the total 
population and in 6.8% of the male population. No UTX mutations were detected 
in female samples. The two deletions and one of the frameshift mutations were 
found in patients with typical T-ALL, and the other two in patients with ETP ALL. 
Regarding the adult T-ALL case, all 83 samples were collected in the Eastern Co- 
operative Oncology Group (ECOG) clinical trials E2993 (ref. 40) and C10403 and 
analysed under the supervision of the Columbia University Medical Center Institu- 
tional Review Board. Informed consent to use leftover material for research purposes 
was obtained from all of the patients at trial entry in accordance with the Declaration 
of Helsinki. All exon sequences from UTX were amplified from genomic DNA by 
PCR and analysed by direct dideoxynucleotide sequencing. The primer sequences 
used for UTX sequencing have been described previously”’. 
Data sources and computational tools. Patient and physiological T-cell express- 
ion data were obtained from refs 2, 24, 41. Human genome assembly version hg19/ 
GRCh37 and Ensembl annotation release 69 were used for the RNA-seq, ChIP-seq 
and data integration analyses. NOTCH1, RBP-J, H3K4me3 and H3K27me3 ChIP- 
seq data for CUTLLI cells were obtained from ref. 27. For the functional enrich- 
ment analysis, MSigDB” version 3.1 was used. Bowtie’ version 0.12.7 was used for 
alignment of sequenced reads. RNA-seq data analysis was performed using DEGseq“*. 
MACS* version 2.0.10 was used for JMJD3 ChIP-seq peak discovery, in conjunc- 
tion with the irreproducible discovery rate (IDR) method**. GenomicTools” version 
2.7.2 was used for performing genomic interval mathematical operations, genomic 
interval annotation, H3K27me3 ChIP-seq comparisons (GSKJ4 versus control) and 
ChIP-seq heatmap generation. 
Expression analysis of primary samples. Processed T-ALL and B-ALL patient 
microarray expression data were downloaded from ref. 2 (GEO accession GSE33315), 
physiological T-cell expression data from ref. 24 (GEO accession GSE22601) and 
acute myeloid leukaemia (AML) expression data from ref. 41 (GEO accession GSE6891). 
Data were first converted to the logarithmic scale when necessary and then quan- 
tile-normalized across samples. The Wilcoxon two-sided unpaired test per gene 
probe was used to determine significant differences between sample categories 
(T-ALL, B-ALL, AML and physiological T cells; Fig. le). A gene was considered 
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significantly overexpressed in T-ALL compared with the rest of the sample cat- 
egories if at least one of its associated probes was significantly overexpressed in 
T-ALL according to the statistical test. 

Genes experiencing loss of H3K27me3 at TSSs in our mouse NOTCH-IC model 

compared with normal double-positive (DP) mouse cells were obtained from our pre- 
vious study*. Enrichment of human homologues of these genes in JMJD3-correlating 
genes in the patient data described above was estimated as follows. First, Pearson’s 
correlation of JMJD3 expression (separately for each JMJD3 probe) against expres- 
sion of each gene was computed. Then, the distribution of the correlations of the 
genes losing H3K27me3 (human homologues of the mouse genes) was compared 
with that of the genes that did not lose H3K27me3, using Student's t-test (separately 
for each JMJD3 probe, minimum P value shown in the corresponding figure (Fig. 1f)) 
or the Wilcoxon one-sided unpaired test (data not shown), yielding similar results. 
This analysis was repeated for NF-«B1, NF-kB2, REL, RELA, RELB, HES1, UTX 
and EZH2 (Extended Data Fig. 1i). 
JMJD3 peak identification, characterization and overlap with published data 
sets. JMJD3 ChIP-seq reads were aligned using Bowtie (with default parameters, 
except for -m 1 so as to report only unique alignments) on human assembly ver- 
sion hg19. Peak discovery was performed with MACS (version 2.0.10) using default 
parameters, except for using a fragment size of 300 base pairs as estimated with the 
Agilent 2100 Bioanalyzer. Sonicated input was used as a control for peak discovery. 
Then, we used the IDR method“, guidelines and pipeline available for narrow peaks 
at the URL https://sites.google.com/site/anshulkundaje/projects/idr to determine 
highly reproducible peaks supported by both JMJD3 replicates. 

JMJD3 peaks were characterized according to their genome-wide distribution 
(Extended Data Fig. 1k) into the following groups: (a) 1-kilobase (kb) TSS-flanking 
regions of transcript isoforms; (b) gene body regions (excluding any regions over- 
lapping with (a)); and (c) upstream regions ofa minimum of 10 kb anda maximum 
of 100 kb (excluding any regions overlapping with (a) or (b)). 

Co-occurrence of JMJD3 peaks with H3K4me3, H3K27me3, NOTCH 1 and RBP-J 

was computed as the percentage of such peaks (5,000 top-scoring peaks for each 
protein obtained from ref. 27; GEO accession GSE29600) that have some overlap 
with a JMJD3 peak. The statistical significance of these overlaps was determined 
using random resampling simulation (for example, H3K4me3 peaks were randomly 
redistributed along the genome). As a control, we used the percentage of TSSs that 
have JMJD3 peaks (this is a rather conservative control since genome-wide JMJD3 
occupancy is much lower, as a result of JMJD3 being concentrated in TSSs), and 
compared with this control, we obtained an ~7-fold enrichment of H3K4me3- 
JMJD3 (P < 0.001 as determined by the random resampling scheme). Similar enrich- 
ments were obtained for NOTCH1-JMJD3 and RBP-J-JMJD3 co-occurrence, 
whereas no significant enrichment was observed for H3K27me3-silenced or H3K4 
mel enhancer-related regions (Extended Data Fig. 1)). 
RNA-seq analysis. Differential gene expression analysis was performed for each 
matched knockdown versus control pair, separately in each biological or technical 
replicate in each of two cell lines (CUTLL1 and CEM). Three types of comparisons 
were tested: (1) JMJD3 knockdown versus Renilla; (2) JMJD3 knockdown versus 
UTX knockdown; and (3) UTX knockdown versus Renilla. DEGseq** was used to 
analyse (a) matched knockdown-Renilla replicates in separate DEGseq runs and 
(b) all replicates on a combined DEGexp run. For the mouse (Utx knockout) sam- 
ples, spleen and bone marrow from a wild-type male (referred to as animal #9), as 
well as spleen from a wild-type female (animal #10), were compared with spleen 
and bone marrow from a knockout male (#23) and spleen from another knockout 
male (#27) (see also our GEO accession GSE56696). For illustration, scatter plots 
(Fig. 2d and Extended Data Fig. 3g—i) were created using values obtained from DEGseq 
analysis of merged biological and/or technical replicates. Gene RNA-seq FPKM 
values were computed using GenomicTools*’. The P-value cutoff for differential 
expression was set at 1 X 10” °, with the minimum absolute log, fold change set at 
0.5. However, all key results in this study (that is, the significance of the overlaps of 
the various gene expression signatures demonstrating the contrasting roles of JMJD3 
and UTX) are robust to changes in these two parameters (data not shown). 

The P value ofa gene set of size t (for example, GSKJ4-downregulated genes) con- 
taining k genes with a specific attribute (for example, shJMJD3-mediated downregu- 
lation or UTX knockout upregulation) was determined against the null hypothesis 
that k or more such genes could have been observed merely by chance in an equal 
sized gene set that was randomly drawn from the entire reference set of genes of 
size N (that is, all downregulated, upregulated and constant genes). This P value was 
obtained by using the hypergeometric cumulative distribution with parameters N, 
t, k and n, where n is the number of genes possessing the attribute in the entire 
reference gene set of size N. 

H3K27me3 gain and loss analysis. J MJD3-affected (upregulated or downregulated) 
genes were defined as genes whose expression was significantly differentially ex- 
pressed in JMJD3 knockdown cells compared with both Renilla and UTX knock- 
down cells. Changes in JMJD3 binding and the H3K27me3 mark around gene 
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TSSs between cells treated with the inhibitor GSKJ4 and the control GSKJ5 were 
determined using GenomicTools (“genomic_apps peakdiff’ tool) as described in 
a previously published study*. Epigenetic changes between the treatment (shJMJD3 
or GSKJ4) and control samples were determined by evaluating sliding windows 
across the genome using the following protocol. First, enriched ChIP-seq windows 
were identified separately for each of the two samples under comparison using a 
window-based approach and the binomial probability distribution to compare signal 
reads with control reads in each window. Subsequently, for each genomic window 
enriched in at least one of the two samples, the total number of reads was determined, 
and the window read counts were normalized using quantile normalization across 
biological replicates and samples before comparison. Finally, for each window, the 
fold change between the two samples was calculated (GSKJ4 versus control, and 
vice versa). To estimate the false discovery rate, the distribution of the observed 
H3K27me3 fold changes was compared with the distribution of fold changes bet- 
ween replicates of the same treatment. This comparison was performed indepen- 
dently at different H3K27me3 read density levels to control for artificially high 
fold changes due to low read counts in the denominator. Significant epigenetic 
changes are reported at 5% false discovery. 

JMJD3, NOTCH1, H3K4me3 and H3K27me3 heatmaps were generated using 

GenomicTools (“genomic_apps heatmap” utility) over log-transformed read counts 
in 200-nucleotide non-overlapping bins of 4-kb-flanked TSSs. Box plots of H3K27 
me3 log, [fold changes] (GSKJ4 versus control) show the distribution of values in 
(a) JMJD3 targets, (b) commonly downregulated genes upon shJMJD3 and GSKJ4 
treatment, and (c) the intersection of GSKJ4-upregulated and shJMJD3-upregulated 
genes as a negative control. P values were computed using a one-sided Wilcoxon 
unpaired test for (a) and (b) versus the control (c). 
RNA-seq and ChIP-seq replicate reproducibility. For RNA-seq experiments, we 
focused on the reproducibility of gene expression levels as measured by FPKM values. 
For each pair of replicates, we computed the Spearman and Pearson correlations, as 
well as the Pearson correlation on log-transformed FPKM values. In general, Pearson 
correlations were much higher because higher values are dominant, and highly 
expressed genes tend to be more reproducible. Using a Pearson correlation on log- 
transformed values attempts to balance the expression distribution and allow con- 
tributions from genes that are expressed at a lower level, thereby providing a more 
realistic genome-wide reproducibility metric. Spearman correlations focus on the 
ranking of gene expression, and in our experiments, in general, were a more con- 
servative (that is, lower) and consistent (lower variability across various settings, 
and when comparing different cell lines (that is, CUTLL1 and CEM)) estimate of 
reproducibility; therefore, for simplicity, we have reported only the Spearman 
correlations. 

For ChIP-seq ‘broad peak’ experiments (H3K27me3), we also used Pearson, log- 
transformed Pearson and Spearman correlations on (a) TSSs and (b) all genome-wide 
peaks. As before, Spearman correlation was the most conservative and consistent 
estimate of reproducibility. 


For ChIP-seq ‘narrow peak’ experiments (JMJD3), in addition to TSS-based and 
genome-based correlations, we used the IDR method”, guidelines and pipeline avail- 
able for narrow peaks at the URL https://sites.google.com/site/anshulkundaje/ 
projects/idr. Apart from determining the reproducibility, we also used the IDR 
method to determine high-confidence peaks supported by both JMJD3 replicates. 
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Extended Data Figure 1 | JMJD3 is induced through activation of the 
NF-«B pathway in a NOTCH1-dependent mode in T-ALL and binds to 
NOTCH1 target genes. a, Levels of p65 (RELA) protein in control T cells and 
T-ALL tumour cells. A representative sample from three mice is shown. 

b, Schematic representation of the Jmjd3 locus showing the p65 binding site 
(upper) and ChIP analysis for p65 binding to the Jmjd3 locus in mouse control 
T cells and T-ALL tumour cells, as well as T-ALL cells upon treatment with 
y-secretase inhibitor (ySI), which affects NOTCH1 levels (centre). NOTCH1 
binding to this region upon ySI treatment in T-ALL cells is also shown (right). 
c, Analysis of JMJD3 and HES1 messenger RNA levels upon ySI treatment 

of CUTLLI cells. The average of three independent studies is shown. 

d, e, Expression levels of the JMJD3 transcript (d) and protein (e) upon 
treatment of human T-ALL lines (DND41 and CEM) with a NEMO binding 


10 20 30 40 


% overlapping with JMJD3 peaks 


domain (NBD) inhibitor of the NF-«B pathway. f, JMJD3 levels in T-ALL cells 
upon inhibition of the NF-«B pathway using a dominant negative form of IkBa, 
(DN-IkBa). g, h, ChIP for NOTCH1 (g) and H3K27me3 (h) on the Hes1 
promoter upon ySI treatment of mouse T-ALL cells. In d and f-h, the average of 
three studies is shown. In e, a representative example from three studies is 
shown. i, Genes correlated with selected human genes (including JMJD3 and 
NFKB1) were tested for enrichment in loss-of-H3K27me3 genes during the 
transition to T-ALL in the mouse model. j, Overlap of JMJD3 peaks with 
peaks of important activating (H3K4me3 and H3K4mel) and repressive 
(H3K27me3) epigenetic marks, as well as members of the NOTCH1 complex. 
The percentage of TSSs containing JMJD3 peaks was used as a conservative 
control and is an alternative to the much lower genome-wide JMJD3 
occupancy. k, Genome-wide distribution of JMJD3 peaks in human T-ALL. 
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Extended Data Figure 2 | JMJD3 is vital for T-ALL growth through MAML1I, EZH2 and UTX. Extracts from green fluorescent protein (GFP)- 
participation in NOTCH1 transcriptional programs. a, NOTCH1 expressing cells were used as negative control. All experiments were repeated 
interaction analyses for JMJD3, MAML1 and WDRS proteins in 293T cells. three times (biological replicates), and a representative example is shown. 
Interaction with JMJD3 was confirmed in a reciprocal way (right-most lane, d, mRNA expression of JMJD3 and UTX upon treatment with shRNA against 
immunoprecipitation (IP) using an anti-haemagglutinin (HA) antibody). JMJD3 or UTX. The expression after treatment of CEM cells with two shRNAs 
b, Expression of JMJD3 and WDRS in 293T cells, followed by against JMJD3 and one shRNA against UTX and one control (Renilla) is shown. 
immunoprecipitation using the anti-HA antibody against HA-JMJD3. An e, The effects on cell proliferation as measured by the loss of GFP-expressing 
anti-Flag antibody was used for the detection of both proteins. c, NOTCH1 shRNA. HL-60 is an acute promyelocytic leukaemia cell line (APL), which 
interaction studies for JMJD3 and MAMLI proteins in mouse T-ALL cells is a subtype of acute myeloid leukaemia (AML) and is used as control in this 
expressing a Flag/Strep form of intracellular NOTCH1. StrepTactin beads were __ study. For both cell lines, the average results from three representative studies 
used for NOTCH precipitation in the absence of detectable intracellular are shown. f, Annexin V staining upon shJMJD3 and shRenilla treatment of 
NOTCH, and different antibodies were used for the detection of JMJD3, CUTLLI cells (top) and HPB-ALL cells (bottom). 
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Extended Data Figure 3 | JMJD3 binds to genes with important oncogenic 
functions and is vital for T-ALL growth. a, JMJD3 but not UTX genetic 
inactivation impairs the expression of important oncogenic genes. NOTCH1, 
MYC and MAZ, as well as JMJD3, expression levels are shown. shUTX 
treatment results in significant upregulation of JMJD3 compared with shRenilla 
(control)-treated cells. The average results from three studies are shown. 

b, Significant expression changes in NRARP transcript levels upon JMJD3 
knockdown. c, ChIP for H3K27me3 on the NRARP locus. d, e, Binding of 
JMJD3 to the NOTCHI (d) and MAZ (e) promoters upon shJMJD3 and 
shRenilla (control) treatment. The average results from three studies are shown. 
f, Numbers of upregulated and downregulated genes are shown for shJMJD3- 
and shUTX-treated cells compared with shRenilla-treated cells. g, Scatter plot 
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showing the expression levels of important genes in shJMJD3- and shUTX- 
treated CUTLL1 T-ALL cells. Emphasis is given to the NOTCH1 pathway and 
apoptosis-related genes. This is a scatter plot representation of an expression 
analysis comparing three independent studies for shJ MJD3 and two for shUTX. 
h, i, Scatter plots showing the expression levels of important genes in 
shJMJD3- and shRenilla-treated CCRF-CEM T-ALL cells (h) and in shUTX- 
treated CCRF-CEM T-ALL cells (i). CCRF-CEM cells exhibit increased 
NOTCH levels through mutations in the heterodimerization (HD) domain of 
NOTCH] and in the NOTCH1-associated ligase FBXW7. Emphasis is given to 
the NOTCH1 pathway and apoptosis-related genes. This is a scatter plot 
representation of an expression analysis comparing two studies for shJMJD3, 
two for shUTX and two for shRenilla. 
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Extended Data Figure 4 | In vivo studies of the role of JMJD3 in T-ALL 
using luciferase analysis of CEM-, P12- and CUTLL1-based xenograft 
models in immunocompromised (NRG) mouse recipients. a, b, In vivo 
growth of CEM T-ALL cells in subcutaneous xenograft studies upon genomic 
ablation of JMJD3 and UTX (red and green circles denote shJMJD3-expressing 
cells (two different shRNAs); blue denotes shUTX-expressing cells; and 

black circles denote shRenilla-expressing cells). One million CEM cells were 
injected into the animals, and representative graphs from five mouse recipients 
and an image of a representative mouse on days 0 and 6 are shown 

(a). Representative graphs from five mouse recipients and the average luciferase 
intensity on days 0 and 6 are shown (b). ¢, Results for growth of CEM cells at 
different time points post transplantation in subcutaneous xenograft studies 
(n = 5). d, Comparison of in vivo cell growth in the subcutaneous model 

of shJMJD3-, shUTX- and shRenilla-expressing P12 cells (n = 5). 


day9 
shRenilla 


day9 


dayO 


One million P12 cells were injected into sublethally irradiated NRG 
(immunocompromised) recipients, and the mice were monitored every day for 
luciferase activity. Day 0 was the first day that a substantially detectable 
luciferase intensity was measured. The last day of the experiment was the 
day that either luciferase intensity reached saturation or the mice were 
euthanized for humanitarian reasons. Red and green circles denote shJMJD3- 
expressing cells (two different shRNAs, shJMJD3A and shJMJD3B); blue 
denotes shUTX-expressing cells; and black circles denote shRenilla-expressing 
cells. e, Monitoring the change in luciferase intensity over a period of 

seven days in the subcutaneous xenograft model using CUTLL1 T-ALL cells 
(n = 4). f, g, Intravenous xenograft studies using CUTLLI cells injected into 
sublethally irradiated NRG (immunocompromised) recipients (n = 8 or 6, 

as indicated in the figure). In e-g, 0.5 x 10° CUTLLI cells were transplanted, 
and the mice were monitored every day for luciferase activity. 
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Extended Data Figure 5 | UTX is a tumour suppressor and is genetically 
inactivated in T-ALL but is dispensable for physiological T-cell 
development. a, b, Study of lymphoid development in Utx /” compared with 
Utx*' (or Utx*’”, data not shown) background mice. Flow cytometric 
analyses of CD4~ and CD8™ expression (a), and the relative proportions of 
CD4*CD8* (double-positive) thymocytes across different genotypes (b) are 
shown. A representative example from three independent samples (biological 
replicates) is shown. ¢, Illustration of the transplantation scheme for the in vivo 
leukaemia studies. d, e, T-ALL progresses faster in the male Utx knockout 
background (Utx~’”) than in the female wild-type background (Utx*!*) in 
recipients transplanted with NOTCH1-IC-GFP-expressing haematopoietic 
progenitors, as is demonstrated by the white blood cell counts in the peripheral 
blood (d), as well as the percentage of GFP* leukaemic cells in the peripheral 
blood upon transplantation of wild-type progenitors (e) from female mice 
(Utx*!*) compared with the corresponding knockout cells (Utx'). £, Survival 
study of the recipients of cells from male wild-type (Utx*’", n = 7) and 
knockout (Utx '”, n =5) mice expressing NOTCH 1-deltaE(AE)-GFP (an 
allele with weaker oncogenic action than NOTCH1-IC). g, h, Survival analysis 


of recipients upon transplantation of wild-type progenitors from female 

mice (Utx*/*) compared with the corresponding knockout cells (Utx/") 
carrying NOTCH1-IC (g) or NOTCH1-AE (h). i, Quantitative PCR (qPCR) 
validation of the expression levels of one downregulated gene (Suz12) and one 
upregulated gene (117r) in Utx/” (compared with UTX*’”) mice. The average 
results from three independent samples (studies) are presented. j, Targeted 
Sanger sequencing in paediatric T-ALL led to the identification of three cases 
with frameshift mutations. The positions of the mutations are indicated by 
dashed lines in the electropherograms. k, Identification of one in-frame 
deletion (p.Alal4_Ala17del, #1, top panel), one splice acceptor site (#2, second 
panel) and one missense mutation (#3, third panel) in adult T-ALL. Case #4 is 
an adult T-ALL case with wild-type UTX (control, bottom panel). Mutations 
are indicated by red characters. 1, The levels of UTX in CUTLL1 T-ALL cells in 
the absence (—dox) or presence (+dox) of doxycycline. m, n, Apoptosis 
analysis through measuring annexin V staining using control LacZ-expressing 
and UTX-expressing CUTLL]I cells in the absence or presence of doxycycline. 
Representative plots (1, n), as well as the average results (1, m), from three 
independent experiments are shown. 
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Extended Data Figure 6 | Physiological development of the haematopoietic _ e-g, Analysis of major thymic subsets in Jmjd3*'* (n = 7) and Jmjd3/~ 
system in the absence of JMJD3. a, b, Targeting scheme for the generation of | (mn =7) mice. Schematic representation of the flow cytometric analysis 

the Jmjd3"‘~ mouse (a) and PCR-based quantification of the wild-type performed (e). Relative proportions of the major cell populations in the thymi 
and mutant transcripts (b) using a specific primer set for the 3’ end of Jmjd3_—_ of Jmjd3*'* and Jmjd3"'~ mice (f). The mRNA expression of the Jmjd3 gene at 
cDNA. ¢, d, Analysis of the fetal liver for lineage markers (c), as well as the different stages of thymic development (g). h, The expression of NOTCH1 


bone marrow (d) of recipients for haematopoietic progenitors (the Lin” c- target genes (such as Hes1, n = 7) in CD4*CD8°™ (double positive) and 
Kit *Scal* (LSK) population), for the Jmjd3*/* and Jmjd3'~ genotypes. CD4 CD8 CD25" lymphocyte progenitor cells. Representative plots (e), as 
Representative plots from three independent experiments are shown. well as average results (g, h), from seven independent thymi are shown. 
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Extended Data Figure 8 | GSKJ4 inhibitor induces apoptosis and cell cycle 
arrest of T-ALL but not myeloid leukaemia or physiological LSK cells. 

a, Effect of GSKJ4 (at 2 1M concentration) on a panel of T-ALL and myeloid 
lines. The average results from three representative studies are shown. 

b-d, Effects on cell growth (b), apoptosis (c) and the cell cycle (d) in three 
primary T-ALL lines. The average results from three representative studies are 
shown. e, f, Measurement of apoptosis (e, n = 3) and cell cycle effects 

(f, representative study from three experiments) on CUTLLI cells 72h post 
treatment with the inhibitor. g, h, Apoptosis assays using annexin V staining of 
CEM cells (g) after a period of 72h of treatment and measuring caspase 7/9 
activity upon treatment of CUTLL1 T-ALL cells with GSKJ5 or GSKJ4 over a 
period of 24h (h). i, Time course studies of annexin V (top) and cell cycle 
analysis (bottom) of CUTLL] cells over a period of 72 h during GSKJ4 


treatment according to the scheme on top of the figure. j, Expression of the 
wild-type and catalytic mutant of JMJD3 in T-ALL (CEM) cells. k, Cell growth 
analysis of T-ALL cells overexpressing wild-type JMJD3 or a catalytic mutant 
of JMJD3 upon GSKJ4 treatment over a period of 72h. Average results from 
three independent experiments are shown. I, Cell growth of LSK cells upon 
treatment with the control (2 |1M) and different concentrations of the inhibitor 
GSKJ4. m, Annexin V staining of THP-1 (AML) cells after a period of 72h of 
GSKJ4 or GSKJ5 (control) treatment at 2 1M concentration. The average 
results from three independent experiments are shown. n, The mRNA levels are 
shown for three classical NOTCH1 targets (HEY1, NRARP and NOTCH1) over 
a period of 72h during GSKJ4 treatment. The average results from three 
independent experiments are shown. 
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Extended Data Figure 9 | GSKJ4 treatment leads to increased H3K27me3 
levels on NOTCH target genes through specific inhibition of JMJD3 
activity. a-c, Analysis of the promoter area of HEY1 (a), NOTCHI (b) and 
NRARP (c) for H3K27me3, H3K27mel, NOTCHI and JMJD3 enrichment 
over a period of 24h during GSKJ4 treatment. The average results from three 
independent experiments are shown. d, Analysis of the total protein extracts 
from CUTLLI cells for JMJD3 and NOTCH1. e, Analysis of the chromatin 


fraction from CUTLLI cells for the repressive mark H3K27me3, the activating 
marks H3K27mel and H3K4me3, as well as total histone H3 levels. 
Representative plots from three independent experiments are shown. 

f, Snapshots of GSKJ4-associated H3K27me3 changes in major NOTCH1 and 
JMJD3 targets. g, ChIP-qPCR analyses for UTX binding to the NOTCH target 
genes HEY1, NRARP and NOTCH. (RBBP6 was used as positive control). 
The average results from three independent experiments are shown. 
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Extended Data Figure 10 | JMJD3 as a pivotal factor in NOTCH1-mediated 
oncogenic activation in T-cell leukaemia. a, Schematic representation of the 
H3K27me3 writer (the polycomb complex, left) and eraser (JMJD3, right). 
EZH2 contains the catalytic subunit of the complex through its SET domain, 
whereas the EED subunit recognizes the H3K27me3 mark and aids in 
polycomb binding. JmjC domain activity is inhibited by the small molecule 
inhibitor GSKJ4. b, The main idea about the key role of JMJD3 in the NOTCH1 
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transcriptional complex. Before activation of the NOTCH signalling pathway, 
the promoters of classical NOTCH target genes are bound by RBP-J«, together 
with components of the co-repressor complexes and PRC2, leading to low 
gene expression. After the binding of NOTCH1 and its co-activator MAMLI, 
the genes are activated through the recruitment of JMJD3 and the MLL 
complex, with simultaneous eviction of PRC2, which leads to the 
demethylation of H3K27me3 and the methylation of H3K4me3. 
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Structure and mechanism of Zn** -transporting 


P-type ATPases 


Kaituo Wang'+*, Oleg Sitsel'*, Gabriele Meloni’, Henriette Elisabeth Autzen', Magnus Andersson”, Tetyana Klymchuk’, 
Anna Marie Nielsen’, Douglas C. Rees’, Poul Nissen! & Pontus Gourdon'+ 


Zinc is an essential micronutrient for all living organisms. It is required 
for signalling and proper functioning of a range of proteins involved 
in, for example, DNA binding and enzymatic catalysis’. In prokaryotes 
and photosynthetic eukaryotes, Zn’* -transporting P-type ATPases 
of class IB (ZntA) are crucial for cellular redistribution and detoxi- 
fication of Zn”* and related elements”. Here we present crystal struc- 
tures representing the phosphoenzyme ground state (E2P) and a 
dephosphorylation intermediate (E2°P;) of ZntA from Shigella sonnei, 
determined at 3.2 A and 2.7 A resolution, respectively. The structures 
reveal a similar fold to Cu*-ATPases, with an amphipathic helix at 
the membrane interface. A conserved electronegative funnel connects 
this region to the intramembranous high-affinity ion-binding site 
and may promote specific uptake of cellular Zn** ions by the trans- 
porter. The E2P structure displays a wide extracellular release path- 
way reaching the invariant residues at the high-affinity site, including 
C392, C394 and D714. The pathway closes in the E2:P; state, in which 
D714 interacts with the conserved residue K693, which possibly stim- 
ulates Zn* release asa built-in counter ion, as has been proposed for 
H*-ATPases. Indeed, transport studies in liposomes provide experi- 
mental support for ZntA activity without counter transport. These 
findings suggest a mechanistic link between P,,-type Zn” *-ATPases 
and P,;;-type Ht -ATPases and at the same time show structural fea- 
tures of the extracellular release pathway that resemble P-type ATPases 
such as the sarcoplasmic/endoplasmic reticulum Ca”*-ATPase*® 
(SERCA) and Na*, K*-ATPase’®. These findings considerably increase 
our understanding of zinc transport in cells and represent new pos- 
sibilities for biotechnology and biomedicine. 

Zinc is an abundant transition metal in life, serving multiple functions’, 
yet elevated concentrations of Zn” * are toxic, as are its heavy-metal mime- 
tics such as Cd** and Pb”* (ref. 7). Zn” -transporting P-type ATPases 
(the Pyp.2- ATPases ZntA and CadA) are active transporters that are cru- 
cial for the cellular detoxification of these elements’, as well as for the 
subcellular redistribution of micronutritional zinc’. The significance of 
Zn’* -ATPases is further underscored by the presence of multiple and 
occasionally redundant genes encoding these enzymes in higher plants 
such as Arabidopsis thaliana’. The lack of ZntA in animals, the prev- 
alence of such enzymes in pathogens, and the fact that zinc is exploited 
in the host-microorganism arms race (for example, to inactivate vital 
virulence determinants of Streptococcus pneumoniae*) make these Pyz- 
ATPases attractive targets for new antibiotics, antifungals and herbicides. 
ZntA couples ATP hydrolysis at the intracellular A (actuator/dephos- 
phorylation), P (phosphorylation) and N (nucleotide binding) domains 
to ion efflux through the M (transmembrane) domain (Extended Data 
Fig. 1a). The mechanism is schematically described by the ‘Post-Albers’ 
cycle’, which has four principal states (E1, E1P, E2P and E2) that define 
alternating access to an intramembranous high-affinity ion-binding site’ 
(Fig. 1a, centre, and Extended Data Fig. 1b). However, the only structures 


that have been determined for this class of protein are for the related 
Cu’ -transporting Pj-ATPase CopA'” and for a ZntA domain”, lim- 
iting the functional and mechanistic understanding of this class of pro- 
teins. Fundamental questions that remain to be answered include how 
zinc transport is accomplished across the membrane and coupled to 
ATPase activity and how sequence motifs that are specific to Zn?*- 
ATPases relate to structure and function. 

We have determined the crystal structures of two reaction cycle inter- 
mediates of ZntA from S. sonnei, which is 99.2% identical to the Escherichia 
coli ZntA (the best characterized member of the family) and is stimulated 
by the equivalent ions in vitro (Fig. la and Extended Data Figs 2—4a). 
Crystals were obtained using a modified HiLiDe (high concentrations 
of lipid and detergent) technique’* (see Methods) and in the presence 
of the zinc chelator TPEN (N,N,N’,N’ -tetrakis(2-pyridinylmethyl)-1,2- 
ethanediamine) plus either BeF; or AIF, , mimicking the zinc-free 
phosphoenzyme ground state (denoted E2P) and a dephosphorylation 
intermediate (E2°P;), respectively. The structures were determined at 
3.2 A and 2.7 A resolution (Extended Data Table 1) and reveal a Pip-type 
ATPase fold reminiscent of CopA, with intracellular A, P and N domains 
and eight similarly arranged transmembrane segments (MA, MB and 
M1-M6), albeit with shorter extracellular loops (Extended Data Fig. 5). 
The heavy-metal binding domain (HMBD), a characteristic feature of 
Pyp-ATPases (Extended Data Fig. 1a), was, however, not visible in the 
electron density maps, as was also the case for CopA"’. The intracellular 
domains are arranged differently in the two S. sonnei ZntA structures, 
in agreement with the equivalent states of CopA and SERCA*"»: BeF3 
mimics the phosphorylation of D436 (S. sonnei ZntA numbering through- 
out), which is buried and protected by the catalytic TGE loop of the A 
domain in this E2P-like state, whereas the TGE motif activates a water 
molecule coordinated to AIF, , imitating dephosphorylation at D436 
as in an E2°P;-like state (Extended Data Fig. 6). 

The single intramembranous high-affinity Zn** -binding site of ZntA” 
deserves particular attention. Biochemical studies have indicated that 
Zn’* binding depends on C392 and C394 (in the CPC motif of the M4 
segment), K693 (M5) and D714 (M6)’®"». In the structures, these four 
residues overlap well with the equivalent cysteines, asparagine and methi- 
onine in the corresponding E2°P; state of Cu‘ -ATPases!! (Fig. 1b, c). 
Further supporting an important functional role of these four residues, 
the only other conserved side chains in the region that may participate 
in Zn?* binding are those of M187 and Y354, but our mutations of these 
residues do not affect function (Fig. 2a). However, the K693 side chain 
would be an unexpected ligand for Zn? * (refs 16, 17), and indeed Zn?* 
binding is unaffected by the mutation of lysine to alanine at this position 
(K693A) (Fig. 2b), suggesting that binding is instead established by the 
two cysteine thiolates and two oxygen ligands, possibly from D714 in 
a bidentate fashion, which is a recurrent coordination motif of Zn?* - 
binding sites'*’”. Congruent with this role, the relative activity of the 
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D714E mutant decreases with the increasing ionic radii and coordina- 
tion distances of Zn?*, Cd’* and Pb’* (Fig. 2a). 

What therefore is the role of the essential K693? One striking dif- 
ference between the ion-binding region (between M4, M5 and M6) of 
CopA and ZntA is the aforementioned D714 in S. sonnei ZntA. The side 
chain of D714 is stabilized by K693 in the E2:P; state (Fig. 3a and Ex- 
tended Data Fig. 7a): this interaction potentially has important functional 
implications, with the charge-stabilizing lysine residue possibly acting 
as a built-in counter ion in zinc-free states (that is, as observed here). 
Such a mechanism was proposed earlier for plasma membrane H* - 
ATPases’* (Fig. 3c). Indeed, residues R655 and D684, which form this 
pair in the Arabidopsis thaliana H* -ATPase AHA2, are located at posi- 
tions that almost overlap the positions of K693 and D714 of S. sonnei 
ZntA™, pointing to common principles of ion transport in Pyg- and Pyy- 
type ATPases. 

Transport and putative H* counter transport were then analysed in 
proteoliposomes. Whereas Zn?” accumulated in vesicles (Fig. 2c), we 
were unable to detect any changes in intravesicular pH (Fig. 2d). Asa posi- 
tive control, we used the Ca”*-ATPase LMCAI from Listeria monocyto- 
genes, which showed clear H* antiporter activity’® (Fig. 2d). Furthermore, 
while the electron density maps allowed the identification of several water 
molecules in the E2—-AlF, structure, no sites were detected that could 
be ascribed to, for example, K", Na*,Ca?* or Mg** counter ions, and 
these cations were also not required for ZntA activity (Extended Data 
Fig. 4b and see Methods for details). All considered, our observations 
thus support zinc flux without associated counter-ion transport. 

In Cu*-ATPases, ion release has been proposed to occur via a path- 
way lined by MA, M2 and M6 that remains open in the E2P and E2°P; 
states”. We were consequently surprised to find that no extracellular 
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Figure 1 | Structures of the S. sonnei Zn**- 
ATPase. a, The E2-BeF; (E2P, left) and E2- 
AIF, (E2°P,, right) structures with class-specific 
helices MA, MB and MB’ coloured in cyan, helices 
M1-M6 coloured in beige, and the A, P and N 
domains coloured in yellow, blue and red, 
respectively (domain names are highlighted in 
bold). Key residues for function are highlighted. An 
extracellular release pathway (white surface) is 
present only in the E2P state, as computed with 
CAVER”. A schematic Post-Albers reaction cycle’ 
for ZntA is shown (centre) with the experimentally 
determined structures marked in red. b, Close view 
of the intramembranous ion-binding region 
coloured as in a, displaying the proposed Zn?* - 
binding residues C392 and C394 (in M4), K693 
(M5) and D714 (M6)'*"’. c, Equivalent view to b of 
the Legionella pneumophila Cu’ -ATPase CopA 
(Protein Data Bank (PDB) ID, 3RFU"). Critical 
CopA residues overlie equally important residues 
of Zn?*-ATPases (see also Extended Data Fig. 2). 
Side chain atoms are depicted in blue (nitrogen) 
and red (oxygen). 


pathway was evident in the E2°P; state of S. sonnei ZntA, in contrast to 
CopA, and that, instead, substantial conformational changes occurred 
in the M domain in the E2P to E2°P; state transition. These conforma- 
tional changes resemble, by contrast, those of SERCA, in which a wide 
opening appears between M1-M2, M3-M4 and M5-M6 in the E2P 
state* and reseals in the occluded E2:P; state (Fig. 3d and Extended Data 
Fig. 1). In ZntA, the extracellular portions of M5-M6 shift away from the 
Zn°* -binding CPC motif, and rearrangements (less pronounced than 
those of SERCA) in M2 and M3-M4 expose the high-affinity site to the 
extracellular side (Fig. 3d and Extended Data Fig. 7a). This SERCA-like 
pathway must allow release of free zinc into the extracellular environ- 
ment, as further supported by an observed reorientation of the sulphur 
side chains of the CPC motif away from the ion-binding site between 
the E2P and E2:P; states. With K693 being flexible without a strong in- 
teraction with D714 in the E2P state (Fig. 3b), it is possible that K693 has 
an additional role: electrostatic repulsion against the re-entry of Zn?" , 
possibly further stimulated by E202 guiding Zn”~ into the extracellular 
environment. The equivalent residue to E202 in SERCA and CopA (E90 
and E189, respectively) has been proposed to serve a similar purpose'!”®, 
and supporting this notion, E202 is critical for enzyme function”’ (Fig. 2a). 
Furthermore, E202 showed considerable conformational flexibility in a 
60-ns molecular dynamics simulation of the open E2P structure, link- 
ing the intramembranous ion-binding site to the extracellular environ- 
ment, as is also supported by steered molecular dynamics simulations 
of Zn’* passage from the CPC motif to the extracellular environment 
(Extended Data Fig. 7b-e). 

One important consideration is how Zn~ is initially delivered to 
ZntA from the intracellular milieu. Although the current structures of 
Zn’ -free states are outward-oriented and therefore closed towards 
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Figure 2 | Functional studies of zinc, cadmium, lead and counter-ion 
transport in S. sonnei ZntA. a, ATP turnover associated with different 

S. sonnei ZntA constructs in detergent-lipid solution, relative to wild-type 
activity with each ion. The specific activities of wild-type (WT) S. sonnei 
ZntA with Zn?*, Cd** and Pb** were 592 + 23, 491 + 10 and 813 + 

23 nmol P, mg’ min |, respectively; the mean + s.d. of technical replicates 
is shown (n = 3). b, Zn?* binding to different S. sonnei ZntA constructs, as 
determined using the dye Zincon. ZntA binds to two Zn’” ions: one binds to 


the intracellular side (Extended Data Fig. 1b), they hint at how Zn’* entry 
may take place. The uptake of intracellular cations by P-type ATPases 
is expected to occur at the membrane interface at M1 (refs 4, 5, 11, 18, 
22, 23), and in CopA through an entry site with an invariant methionine. 
Sequence analyses show that M1 segments in Zn? * -ATPases also har- 
bour a conserved methionine (M187), although this residue is located 
closer to the CPC motif in ZntA (Fig. 1b, c and Extended Data Fig. 2), 
but mutational studies indicate that this residue alone is not essential 
(Fig. 2a). However, in contrast to CopA, the entry area in Zn? * -ATPases 
displays a conserved and negatively charged funnel structure (lined by 
E184, E214 and D348 at the membrane interface) that stretches towards 
the intramembranous ion-binding site and that is plugged by M187 
and F210, the latter of which is conserved as a phenylalanine or tyrosine 
in ZntA (Fig. 3e, f, Extended Data Fig. 8a and Extended Data Table 2). 
Whereas the activity of the F210A mutant is only moderately affected 
in vitro, the M187A and F210A double mutant is inactive, with less 
zinc binding than the wild type (Fig. 2a, b). We note that the equivalent 
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the high-affinity site in the transmembrane domain, and one binds to the 
HMBD. The mean + s.d. of biological replicates is shown (n = 3). ¢, d, Zinc 
transport by wild-type and D436N S. sonnei ZntA proteoliposomes (c) and 
measurements of H™ counter-ion transport by wild-type S. sonnei ZntA and 
Ca?*-ATPase LMCA proteoliposomes (d), as monitored using the zinc- 
selective chelator FluoZin-1 (c) and the pH indicator pyranine (d) (see also 
Extended Data Fig. 4c and Methods). exc., excitation; em., emission. 


residue to S. sonnei ZntA F210 in H* -ATPases, N106, isa gatekeeper for 
H™ entry'* (Fig. 3c). With the conformational changes anticipated 
for the shift to the El states, Zn” * may thus be guided by M187 through 
the funnel and led directly to C392 in the high-affinity site, which is 
capped by M148 and F210. Because the funnel is narrow and negatively 
charged, we find it likely that free Zn** ions and not a glutathione- 
ligated complex will interact with the funnel, unlike the proposed uptake 
mechanism of the heavy-metal ABC exporter Atm1 from Novosphingobium 
aromaticivorans”. 

The role of the HMBD of P;g-ATPases is puzzling”. In Cu*- 
ATPases, a platform formed by an amphipathic helix, MB’, at the intra- 
cellular membrane interface (Fig. 3g) has been proposed to serve as an 
interaction site for HMBDs, as well as for metal-donating chaperones, 
allowing allosteric regulation and copper supply to the ATPase core. The 
MB’ platform and its amphipathic character are maintained in ZntA, ex- 
posing several positively charged residues to the intracellular side (Fig. 3e, f 
and Extended Data Table 2). However, as no equivalent chaperones 
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Figure 3 | Details of the S. sonnei ZntA 
structures. The blue mesh represents the final 
2F, — F- electron density, contoured at lo 

(other colours as in Fig. 1a unless noted). See 
Methods for additional details on figures. a, Close 
view of K693 and D714 in the E2°P; state. b, Close 
view of K693 and D714 in the E2P state. 

c, Comparison of the transmembrane regions of 
S. sonnei ZntA-AIF, in the E2°P; state, A. 
thaliana AHA2 H* -ATPase (in the E1-ATP state; 
PDB ID, 3B8C"*; grey) and an E1-ATP model of 
S. sonnei ZntA (brown). Inset, identical view of the 
equivalent region of the E1-ATP (black) and E2:P; 
(orange) states of SERCA (PDB ID, 1T5S and 
3BOR, respectively). d, Structural differences 
between the extracellular portions of the E2P 
(coloured as in Fig. 1a) and the E2°P; structures 
(black) (see also Extended Data Fig. 7a). e-g, The 
MB’ platform of the E2°P; state of S. sonnei ZntA 
(e, f) and L. pneumophila CopA (g). 


Figure 4 | Putative zinc transport mechanism 
of ZntA. A transport cycle based on schematic 
models of the El and E1P states and the E2P and 
E2:P; structures. In the presence of intracellular 
zinc, Zn’* enters the ATPase through the 
electronegative funnel (red) at the MB’ 

platform (1). Upon Zn" binding (2) to the 
intramembranous ion-binding site (grey circle), 
F210 and M187 occlude the ion entry funnel (3), 
preventing backflow of Zn*". Substantial domain 
rearrangements in transition to the E2P state open 
the extracellular pathway (4), lowering the affinity 
for Zn** and mediating Zn”* release (5), possibly 
stimulated by K693 (6). Dephosphorylation 
triggers closure of the transmembrane domain, in 
which K693 (as a built-in counter ion) forms a salt 
bridge with D714. Upon dephosphorylation, the 
side chains move to their initial positions (7) before 
an E2 to El transition is stimulated by the presence 
of intracellular Zn**. 
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are known for zinc, the metal is most likely delivered by chelators such 
as glutathione, rendering the HMBD the most likely interaction can- 
didate for the MB’ platform in ZntA. Using a known structure of the 
almost identical HMBD of E. coli ZntA’’, the ClusPro docking server 
docks the domain immediately at MB’, stabilized by charge comple- 
mentation (as was also proposed for CopA”””®) with the metal-binding 
CXXC motif (where X denotes any amino acid residue) being solvent 
accessible in the vicinity of the entry funnel (Extended Data Fig. 8b-d). 
Truncations and mutations of the HMBD retain a functional ZntA, only 
with reduced activity*” (Fig. 2a), and we therefore favour an autoregu- 
latory role for this domain. 

The first atomic structures of a Zn’ ' -transporting Py-type ATPase 
reveal unique features. These include an intracellular, negatively charged 
and presumably ion-catching funnel, a high-affinity Zn’ * -binding site 
with a putative lysine switch acting as a built-in counter ion (with an 
unexpected similarity to Pyy-type plasma membrane H *_ATPases) and 
an extracellular Zn *-release pathway (which, unlike that of copper- 
transporting P-type ATPases, resembles that of the classical Py;-type 
ion pumps). These findings significantly increase our understanding of 
zinc transport in cells (Fig. 4) and represent new possibilities for bio- 
technology and biomedicine. Detailed insight into the transport mech- 
anism and specificity determinants may, for example, aid in using plant 
biotechnology to accumulate valuable zinc in edible plants or to decon- 
taminate heavy metals in soil, and the release pathway may be a favour- 
able target site for new antibiotics. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. Several ZntA homologues from different 
prokaryotes were cloned and tested for expression, purification and crystallization 
ina parallel approach. S. sonnei ZntA (UniProt ID, Q3YW59) was cloned into pET-52 
with a carboxy-terminal hexahistidine tag and transformed into the C41(DE3) 
E. coli expression strain. Cells were grown in LB medium at 37 °C to an absorbance 
at 600 nm (Agogo) of 1.0, and the shaker flasks were cooled for 30 min with iced water. 
Expression was then induced with 1 mM isopropyl-B-b-thiogalactoside (IPTG) (final 
concentration) at 20 °C for 20 h. Harvested cells were resuspended in TKG buffer 
(17 g cells per 100 ml buffer) containing 20 mM Tris-HCl, pH 7.5, 200 mM KCl 
and 20% (v/v) glycerol and then frozen at —20 °C. Before cell rupture, the solution 
was added (final concentrations, 5 mM MgClz, 5 mM f-mercaptoethanol (BME), 
2 ug ml’ DNase I, 1 mM phenylmethanesulphony!l fluoride and Roche protease 
inhibitor cocktail (1 tablet per 200 ml)), and the cells were lysed using a high pressure 
homogenizer (three times, 15,000-20,000 p.s.i.). The sample was then kept at 4 °C 
throughout the entire purification procedure until crystallization. Cell debris was 
removed by centrifugation at 23,000g for 20 min, and membranes were isolated by 
ultracentrifugation at 250,000g for 3h. The membrane pellet was resuspended in 
20 mM Tris-HCl, pH 7.5, 200 mM KCl, 20% (v/v) glycerol, 5 mM MgCl, and 5 mM 
BME, toa final concentration of 12 ml buffer per g membrane and then exposed to 
10mg ml- (final concentration) octaethylene glycol monododecyl ether (C;2Es) 
for 1h with gentle stirring. Unsolubilized material was removed by ultracentrifu- 
gation at 250,000g for 30 min. The supernatant was supplemented with imidazole 
and solid KC] (final concentrations of 50 and 500 mM, respectively), filtered (0.22 [um) 
and then applied to several sequential 5-ml pre-packed Ni**-chelating columns 
(HisTrap HP, GE Healthcare; material from 41 cells per column). The columns 
were washed with buffer containing 20 mM Tris-HCl, pH 7.5, 200 mM KCl, 20% 
(v/v) glycerol, 5 mM MgCl, 5mM BME, 0.15 mg ml! C)3Es and 50 mM imida- 
zole until the absorption at 280 nm (Ajg9) reached the baseline, and elution was 
achieved with an additional 450 mM imidazole (final concentration). The S. son- 
nei ZntA-containing fractions were pooled, and the protein was concentrated to 
20 mg ml‘ and then subjected to size-exclusion chromatography. Protein (50 mg) 
was injected into an XK16/100 column prepared with a 100 ml column volume of 
Superose 6 Prep Grade (GE Healthcare) equilibrated with 20 mM MOPS-KOH, 
pH 6.8, 80 mM KCl, 20% (v/v) glycerol, 5 mM MgCl, 5 mM BMEand 0.15 mg ml ~ : 
C,2Es, and the resultant main peak from each run was pooled and concentrated to 
12mg ml’, aliquoted, flash frozen in liquid nitrogen and stored at —80 °C. Yields 
exceeded 10 mg purified protein per | E. coli cell culture. The final protein purity 
was monitored using SDS-PAGE, and the protein concentration was assessed by 
measuring Ago. 

Crystallization. S. sonnei ZntA aliquots were thawed and supplemented with 
4mg ml! (final concentration) C,,Eg and incubated without stirring for 16h at 
4 °C, reaching a modified HiLiDe condition'*. The sample was then ultracentrifuged 
at 100,000g for 10 min, diluted to a final concentration of about 6-8 mg ml pro- 
tein and treated with 10 mM NaF, 2 mM AICI, or BeSO,, 2mM EGTA and 10 1M 
N,N,N',N' -tetrakis(2-pyridinylmethyl)-1,2-ethanediamine (TPEN) (final concen- 
trations) for 30 min. Crystals were grown using the hanging drop vapour diffusion 
method at 19°C. The best S. sonnei ZntA E2-AIF, crystals were grown using a 
reservoir with 300 mM lithium acetate, 3% (v/v) t-butanol, 14% polyethylene glycol 
2000 monomethyl ether (PEG 2000 MME), 7% (v/v) sorbitol, 10% (v/v) glycerol 
and 5 mM BME. By contrast, the best S. sonnei ZntA E2-BeF; crystals were obtained 
using a reservoir with 100 mM MgCh, 200 mM lithium acetate, 17% (v/v) PEG 2000 
MME, 10% (v/v) glycerol and 5 mM BME. More than 1,000 crystals were fished with 
litho-loops, flash cooled in liquid nitrogen and tested at synchrotron sources. The 
final data sets were collected at the Swiss Light Source, Villigen, Switzerland, using 
the X06SA beam line and a wavelength of 1.0000 A (0.9787 A for Se-E2°P,[AIF]), a 
temperature of 100 K and the Pilatus 6M pixel detector. 

Data processing and structure determination. Data were processed and scaled 
with the program XDS"' to 2.7 Aand 3.2 A resolution. The E2-AIF, and E2-BeF;_ 
crystals belonged to space groups C222, and P2,, respectively. Initial phases for the 
E2-AlF, form were obtained with molecular replacement using Phaser*, and mono- 
mer A of L. pneumophila CopA (PDB ID, 3RFU"’) was used as a search model. 
Anomalous peaks in a Se-SAD (Se single-wavelength anomalous diffraction) data 
set of the E2-AlF, form were calculated using the molecular replacement phases, 
and the Se-Met positions were used to guide model building. Model building was 
performed with Coot”, using the L. pneumophila CopA structure as a template. 
Model refinement was carried out using phenix.refine*’, applying TLS parameters 
in the late stages of refinement only, reaching Reryst/Rfree values of 20.7/24.0 (E2- 
AIF, ) and 21.0/28.1 (E2-BeF; ). The E2-BeF; form was also determined by 
molecular replacement using the refined S. sonnei ZntA E2-AlF, structure as a 
search model and refined using a similar procedure. The final refinement statistics 
are listed in Extended Data Table 1. Structures were analysed using MolProbity, 
indicating that 96.35/94.42, 3.49/5.08 and 0.17/0.51% of the residues were in the 
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favoured, allowed and non-favoured regions, with 6.64/10.91% rotamer outliers 
and 6.48/10.11% as clash scores, respectively, for the E2°P; and E2P states. 
Functional characterization. The purification protocol for functionally assessed 
S. sonnei ZntA constructs was similar to the one described for crystallization (AHMBD 
lacks the first 103 residues). However, following affinity chromatography, the sam- 
ples were treated with 1 mM EDTA and then subjected to a 5-ml HiTrap desalting 
column (GE Healthcare) using the equivalent SEC buffer to that for crystallization. 
Release of inorganic phosphate (P;) associated with ATPase activity was assessed 
using the Baginski assay”*. The reaction system contained 5 pg protein, 40 mM 
MOPS-KOH, pH 6.8, 150 mM NaCl, 5 mM KCl, 5 mM MgCl, 3.0 mg ml} C,2Es, 
1.2 mg ml“! soybean lipid, 20 mM cysteine, 5 mM NaN; and 0.25 mM Na,MoO, 
ina total volume of 50 pl. This solution was first incubated with different transition 
metal ions or EGTA, supplemented with 3 mM ATP (final concentration) to start 
the reaction and then incubated for 10 min while shaking at 37 °C. Freshly pre- 
pared stop solution (50 kl) (2.5% (w/v) ascorbic acid, 0.4 M (v/v) HCl, 0.48% (w/v) 
(NH4)2MoO, and 0.8% SDS) was then added to stop the reaction and start colour 
development. After 10 min incubation at 18 °C, 75 jl colorimetric solution (2% (w/v) 
arsenite, 2% (v/v) acetic acid and 3.5% (w/v) sodium citrate) was added to the mix- 
ture and incubated for another 30 min at 18 °C. Absorbance was measured at 860 nm. 
One experiment with three replicates was performed for each construct and ion. 
Reconstitution in proteoliposomes. E. coli polar lipids (25mg ml‘) and egg- 
yolk phosphatidylcholine (25 mg ml” ') in chloroform were mixed at a 3:1 (w/w) 
ratio and dried under a nitrogen stream and continuous rotation to form a homo- 
geneous thin film in a glass balloon. Lipids were desiccated overnight under vacuum 
(protected from light) and suspended in 1 mM dithiothreitol (DTT) to a final con- 
centration of 25 mg ml '. A concentrated stock (10) was used to bring the suspen- 
sion to a final concentration of 20mM MOPS, pH6.8, 250mM NaCl and 1 mM 
DTT. Lipids were subjected to three rounds of freeze-thawing in liquid nitrogen. 
Proteoliposomes were prepared by extrusion (11 times) through 0.2-1m polycarbo- 
nate filters to form large unilamellar vesicles (LUVs) using a mini extruder (Avanti 
Polar Lipids) equipped with two 1-ml gas-tight syringes. Proteoliposomes were desta- 
bilized by the addition of n-dodecyl-f-pb-maltoside (DDM) toa final concentration 
of 0.02% (w/v) and tilting for 1 h at 18 °C and were subsequently placed on ice for 
10 min. Wild-type and D436N S. sonnei ZntA, as well as LMCA1, were added (1- 
2 mg ml’, purified essentially as described for crystallization) to a final protein-to- 
lipid ratio of 1:20 (w/w), and the mixture was incubated for 1 h at 4 °C under tilting. 
Control liposomes were prepared using the same procedure without the addition 
of protein. Detergent was removed through consecutive incubations with activated 
Bio-Beads SM-2 (Bio-Rad), by exchanging the beads after 1, 16, 18 and 20h. The 
Bio-Beads were subsequently removed, and the proteoliposomes were collected by 
ultracentrifugation at 163,000g for 45-60 min at 4 °C and resuspended in 20 mM 
MOPS, pH 6.8, 250 mM NaCl and 1 mM DTT (Buffer PL) to a final protein con- 
centration of 1 mgml'. 

Zinc transport assays using FluoZin-1. Wild-type and D436N S. sonnei ZntA 
proteoliposomes were diluted 1:2 in 20mM MOPS, pH6.8, 250mM NaCl and 
1 mM DTT toa protein concentration of 0.5 mg ml '. A stock of the fluorescent 
Zn?* chelator FluoZin-1 (2mM in H30) was added to a final concentration of 
200 uM. FluoZin-1 encapsulation was performed by three freeze-thaw cycles and 
subsequent extrusion through 0.2-11m polycarbonate filters. Proteoliposomes were 
collected by ultracentrifugation at 163,000g for 45-60 min at 4 °C, and the super- 
natant containing excess FluoZin-1 was removed. Proteoliposomes were washed 
with 1 ml Buffer PL, collected by ultracentrifugation and suspended in the same 
buffer (1 ml). Transport assays were performed in the presence of a final concentra- 
tion of 10 mM MgCl, on 100 pl samples. The reactions were initiated by the addition 
of concentrated stocks of ZnCl, (1 mM) and ATP (10 mM) stock to final concen- 
trations of 40 uM ZnCl, and 1 mM ATP. A fluorescence time course was measured 
in a 96-well plate reader using an excitation wavelength of 485 nm and an emission 
wavelength of 520 nm. Experiments in the absence of ATP were performed in par- 
allel as controls. The ATP-dependent Zn”* transport was determined as AF/Fo, 
where AF is the difference between the fluorescence measured in the presence and 
the absence of ATP, and Fp is the fluorescence recorded immediately after ATP 
addition. Each condition was tested at least in duplicate, and one representative trace 
is shown for each. 

H* counter-ion transport assays using pyranine. S. sonnei ZntA (wild type) and 
LMCA1 and control proteoliposomes were diluted 1:2 to final concentration of 
20mM MOPS, pH7.0, 250mM NaCl, 100mM KCl, 10mM MgCl and 1mM 
DTT (Buffer Counter). A stock of the fluorescent pH indicator pyranine (0.1 M in 
H,0) was added to a final concentration of 10 mM. Pyranine encapsulation was 
performed using three freeze-thaw cycles and subsequent extrusion through 0.2-um 
polycarbonate filters. Proteoliposomes were collected by ultracentrifugation at 163,000g 
for 45-60 min at 4 °C, and the supernatant was removed. Proteoliposomes were 
washed with 1 ml Buffer Counter, collected by ultracentrifugation and suspended 
in the same buffer. The reactions were initiated by the addition of concentrated stocks 
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of ZnCl, (1 mM) or CaCl, (2.5 mM) and ATP (10 mM) to obtain a final concen- 
tration of 40 1M ZnCl, (for S. sonnei ZntA) or 100 uM CaCl, (for LMCA1), and 
1mM ATP. Experiments in the absence of ATP were performed in parallel, as well 
as experiments on control liposomes. A fluorescence time course was measured in 
a 96-well plate reader using an excitation wavelength of 450 nm and an emission 
wavelength of 520 nm. The ATP-dependent H* counter-ion transport was deter- 
mined as AF/Fo, where AF is the difference between the fluorescence measured in 
proteoliposomes and in control liposomes and F, is the fluorescence recorded imme- 
diately after ATP addition. Each condition was tested at least in duplicate, and one 
representative trace is shown for each. 

Effect of Na* or K* on S. sonnei ZntA activity. To investigate the effect of Na~ 
or K* on the activity of wild-type S. sonnei ZntA in detergent micelles, the buffer in 
S. sonnei ZntA stock solutions was exchanged using 5-ml HiTrap desalting columns 
packed with Sephadex G-25 resin with a K*-depleted solution (20 mM MOPS, 
pH6.8, 250 mM NaCl, 1 mM DTT, 0.01 mg ml ! C)Es and 20% (v/v) glycerol) 
ora Na*-depleted solution (20mM MOPS, pH 6.8, 250mM KCl, 1mM DTT, 
0.01 mg ml ' C,,Es and 20% (v/v) glycerol). To exchange the buffer in proteolipo- 
some preparations, 100 pil stocks were diluted in 1 ml 20 mM MOPS, pH 6.8, 250 mM 
NaCl and 1 mM DTT or 20 mM MOPS, pH 6.8, 250 mM KCl and 1 mM DTT and 
regenerated by using three freeze-thaw cycles. Proteoliposomes were collected by 
ultracentrifugation, washed with 1 ml of the corresponding buffer and regenerated 
by using three freeze-thaw cycles. This procedure was repeated, and the proteoli- 
posomes were subsequently extruded through 0.2-11m polycarbonate filters. Pro- 
teoliposomes were collected by ultracentrifugation and suspended in a final volume 
of 100 pil. The ATPase activity was determined using the Baginski method described 
above in the presence of a final concentration of 40 uM ZnCl, or 1 mM EDTA for 
background correction. One experiment with three replicates was performed for 
each of the ions tested (Na‘ and K*). 

Effect of Mg”* on S. sonnei ZntA activity in proteoliposomes. The buffer was 
exchanged by diluting proteoliposome stocks in 20 mM MOPS, pH 6.8, 250 mM 
NaCl, 80 mM KCl, 5 mM MgCl, andl mM DTT (Buffer MCA) or 20 mM MOPS, 
pH6.8, 250mM NaCl, 80mM KCl and 1mM DTT (Buffer MCB) followed by 
three freeze-thaw cycles and extrusion through 0.2-11m polycarbonate filters. Pro- 
teoliposomes were collected by ultracentrifugation at 163,000g for 60 min at 4°C 
and suspended in the corresponding buffer. The ATPase activity was determined 
using the Baginski method as described above. As the buffer used in the assays con- 
tains ATP and MgCl, (required for ATP hydrolysis), the ATPase activity is stim- 
ulated exclusively for correctly oriented S. sonnei ZntA (N-domain facing outside). 
The presence or absence of Mg*~ in the proteoliposome lumen (buffer MCA or 
MCB) allows the identification of the putative requirement of Mg”* counter-ion 
transport for activity. One experiment with three replicates was performed. 
Determination of Zn”* -binding stoichiometry using Zincon. S. sonnei ZntA 
and ZntA mutants were titrated with 5-6 ZnCl, equivalents per mol (using a 
10mM ZnCl, stock) and subsequently desalted in 20 mM MOPS-KOH, pH 6.8, 
80 mM KCl, 100mM NaCl, 3mM MgCh, 0.15 mg ml! C,Eg and 1 mM TCEP 
using a HiTrap desalting column packed with Sephadex G-25 resin to remove free or 
loosely bound metal. The Zn?“ content of the samples was determined by colorimetric 
quantification upon complex formation with 2-[5-(2-hydroxy-5-sulphophenyl)- 
3-phenyl-1-formazyl]benzoic acid (Zincon). Briefly, metal release was achieved 
upon incubation in a final concentration of 30 mM HCI. Subsequently, samples 
were diluted to a final concentration of 100 mM borate, pH 9, and 4M guanidi- 
nium chloride, followed by the addition of Zincon to a final concentration of 40 1M. 
The quantification of Zincon-Zn’* complexes was performed in a 96-well plate 
reader (Perkin Elmer) by measurement of the absorbance at 630 nm using a cali- 
bration curve obtained by the addition of an increasing amount of ZnCl, in the 
same buffer. The protein concentration was determined by a modified Bradford 
assay. Protein solutions (10 ul) were incubated with 10 pl 1 M NaOH. Subsequently, 
500 pl Bradford reagent was added, and quantification was performed in a 96-well 
plate reader by measurement of the absorbance at 600 nm using a calibration curve 
obtained with BSA standards. Three independent experiments with three replicates 
for each experiment were conducted. 

ClusPro docking. S. sonnei ZntA in the E2P state and the E. coli ZntA HMBD frag- 
ment containing residues 46-118 (PDB ID, IMWY’”’) were chosen. The sequence 
identity of the E. coli ZntA HMBD with the corresponding residues of the S. sonnei 
ZntA HMBD was 97%. Docking was done using the ClusPro server*’. The best 
model in the van der Waals + electrostatics scoring scheme was selected, as judged 
by cluster size scores. 

Molecular dynamics simulations. Two 60-ns atomistic molecular dynamics sim- 
ulations were run, one for each of the two ZntA structures. AIF, bound to the 
E2°P, structure was modelled as H,PO, as described previously*’”. D436-BeF; in 
the E2P structure was modelled as a phosphorylated aspartate using CHARMM27 
parameters**. The bound Mg’* was retained in both structures. ZntA was embedded 
ina dioleoylphosphatidylcholine (DOPC) membrane based on the coordinates ofa 


pre-equilibrated slab multiplied eight times from the Laboratory of Molecular & 
Thermodynamic Modelling, and the proteins were positioned according to trans- 
membrane alignment with the Orientations of Proteins in Membranes database 
coordinates of the E2°P; Cu*-ATPase structure (PDB ID, 3RFU")”. Lipids with 
atoms within 0.8 A of any protein atom were deleted. Finally, the protein-membrane 
systems were further solvated with TIP3P”° water and neutralized with sodium. 

Molecular dynamics simulations were run using the NAMD 2.8 program" employ- 
ing the CHARMM27 force field for proteins” and the CHARMM36 force field for 
lipids*’. Before simulation, the systems were subjected to 2,000 steps of conjugate 
gradient minimization. Then, a 0.5-ns molecular dynamics simulation was performed 
where everything but the lipid tails was kept constant (NVT ensemble, T = 298 K), 
allowing the lipids to adapt to the protein to some extent. Next, the systems were 
minimized for 1,000 steps after which all atoms were allowed to move freely for 0.5 ns 
(NPT ensemble, T = 298 K, P= 1 atm) except for the protein backbones, which 
were held fixed. Finally, all atoms were allowed to move freely in a production run 
of 60 ns. The temperature was controlled by Langevin dynamics, and the Nosé- 
Hoover-Langevin piston method was used for controlling the pressure“***. The 
electrostatics were fully accounted for by applying the particle mesh Ewald method 
with periodic boundary conditions’. The van der Waals interactions were trun- 
cated at 12 A, applying a switching function at 10 A. The neighbour list containing 
all pairs of atoms for which non-bonded interactions are calculated included atoms 
within 14 A of each other and was updated for every 20 steps. Bonded interactions 
were evaluated every 1 fs, while electrostatic and van der Waals interactions were 
evaluated every 4 and 2 fs, respectively. Each production run was for 60 ns, pro- 
ducing 60,000 frames, of which 2,000, evenly spread over the simulation time, were 
used for analysis. 

To describe the release pathway and accompanying Zn” ‘ —protein interactions, 
a steered molecular dynamics (SMD) approach was used‘”“’. A divalent Zn’* was 
placed between the ion-coordinating residues C392, C394 and D714 in the E2P state, 
and this was followed by deletion of three clashing water molecules and a 10,000-step 
conjugate-gradient energy minimization. A force constant of 5 kcal mol’ A~ and 
velocities of 10-20 Ans ' were applied to the ion directed from the inside out in 
the z-direction in ten independent 1-ns simulations. Similar release pathways were 
observed in the ten SMD simulations, and the number of Zn?* —protein interac- 
tions withina 5 A cut-off were calculated. The Cas of six remote residues (148, 198, 
367, 383, 385 and 705) were restrained to keep the system from drifting when apply- 
ing the force. 
Figures. Structural representations were generated using PyMol”. Helices MA, 
MB and MB’ have been removed for clarity in Fig. 1b, c, and helices M3-M4 of 
L. pneumophila CopA and S. sonnei ZntA were aligned to generate Fig. 1c. In Fig. 3c, 
the structures were aligned using the M4-M6 transmembrane helices, and the view 
is from the extracellular side. Taking the E1-E2 conformational changes into account, 
K693-D714 (S. sonnei ZntA) and R655-D684 (A. thaliana AHA2) almost overlap. 
In H*-ATPases, A. thaliana AHA2 D684 participates in H~ transfer to the extra- 
cellular side, and R655 has been proposed to stimulate H* release from D684 and 
prevent re-protonation’’. F210 of S. sonnei ZntA separates the electronegative ion 
entry funnel from the membranous Zn?* site and overlaps with A. thaliana AHA2 
N106, which stabilizes the protonated AHA2 D684 (ref. 18) and blocks intracel- 
lular H* exchange”. In Fig. 3d, the electron density is provided for the E2P state. A 
deep pathway reaches the intramembranous high-affinity ion-binding site and 
may allow Zn?* release via E202. In Fig. 3e, the view is from the intracellular side. 
Ion entry to S. sonnei ZntA may occur through negatively charged residues placed 
inside the periphery of the positively charged residues of MB’. The view in Fig. 3fis 
identical to that in Fig. 3e, displaying a highly electronegative funnel (negative sur- 
face in red and positive in blue) formed by the residues of M1, M2 and M3. The funnel 
extends towards the ion-binding CPC motif and is constricted by M187 and F210, 
presumably guiding ions to the membranous high-affinity ion-binding site and exclud- 
ing non-transported compounds (see also Extended Data Table 2b). The view in 
Fig, 3g is identical to that in Fig, 3e but for the Cu* -ATPase CopA of L. pneumophila 
(PDB ID, 3RFU"), for which ion uptake has been proposed to occur instead through 
a transient Cu” -binding site. 
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Extended Data Figure 1 | Topology and reaction cycle of P-type ATPases. 
a, Topology of ZntA, CopA and SERCA. Key residues in the HMBD and A, P, N 
and M domains are highlighted. In ZntA, the negatively charged ion entry 
funnel and release pathway are outlined. D436 in S. sonnei ZntA is the 
autophosphorylated/dephosphorylated catalytic aspartate in the DKTGTXT 
motif of the P domain. C392 and C394 in M4, K693 in M5 and D714 in M6 of 
S. sonnei ZntA have been proposed to bind zinc in biochemical studies'®"**°. 
b, The Post-Albers (E1 to E2) reaction cycle of Zn? * transporting P-type 
ATPase”*’. Phosphorylation events in the intracellular domains drive large 


OOH 


Bat 


E1.P;-ADP 


E2.P 


conformational changes that permit alternating access to transport sites in 
the membrane about 50 A from the ATP-targeted catalytic aspartate. 
According to the model, a high-affinity state (E1), which is open to the 
intracellular space, binds to Zn?" and enters an occluded state. This state 
then undergoes phosphorylation. Completion of this event (E1P) triggers the 
release of the Zn’*, establishing an outward-facing, low-affinity state (E2P). 
Release of the inorganic phosphate (P;) yields the fully dephosphorylated 
conformation (E2), which is followed by restoration of the inward-facing 
conformation (E1), which initiates a new reaction cycle. 
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ELNALGVKGVILTGDNPRAAAAIAGELGLEGF KAGLLPEDKVKAVTKLNGJOHAPLAMVGDGINDAPAMKAAATIGIAMGSGTDVALE 
ELOOSGIEIVMLTGDSKRTAEAVAGTLGIKKVVAEIMPEDKSRIVSELKDKGLIVAMAGDGVNDAPALAKADIGIAMGTGTDVAIE 
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Extended Data Figure 2 | Structure-based sequence alignment of S. sonnei _ion-binding residues C392, C394 and D714 are indicated in purple; the 
ZntA and L. pneumophila CopA. Helix positions are indicated for S. sonnei _ catalytically phosphorylated aspartate and the dephosphorylating TGE motif 
ZntA, and noteworthy residues are highlighted. Four of seven amino acid are highlighted in green. E202 and K693, which are possibly involved in ion 
positions in which ZntA differs between S. sonnei and Escherichia coliand that __ release, are marked in black. The alignment was performed using SALIGN”. 
are likely to be functionally irrelevant are highlighted in grey. The high-affinity 
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Extended Data Figure 3 | Electron density of the determined E2-P, state single-wavelength anomalous diffraction) data and phases obtained from 

of S. sonnei ZntA. a, Final 2F, — F, electron density of S. sonnei ZntA inthe _molecular-replacement-guided model building. The anomalous difference 
E2:P; state. The density is contoured at 10, and the view is equivalent to Fourier map is contoured at 30. A view of the entire protein (b), and a view of 
that shown in Fig. la. b, c, Se-Met peaks calculated using Se-SAD (Se the M domain (c) are shown. 
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Extended Data Figure 4 | Functional assays of S. sonnei ZntA. a, Wild-type 
and AHMBD (inset) S. sonnei ZntA ATPase activity is stimulated by Zn*~, 
Cd’* and Pb**. ATPase activity (normalized; the activity in the presence of 
Zn’* is set at 100% for the wild type and AHMBD, respectively) was 
determined using the Baginski assay (see Methods for details). This ion 
stimulation profile matches the one observed for ZntA from E. coli*’. The 
mean + s.d. of technical replicates is shown (n = 3). b, Effect of K*, Na* and 
Mg’* on S. sonnei ZntA activity. The ATPase activity of wild-type S. sonnei 
ZntA in detergent micelles or upon reconstitution in proteoliposomes, in 
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buffers containing exclusively Na* or K~, as determined by the Baginski assay, 
is shown. For Mg* *,, the activity was in the proteoliposomes for internal buffers 
with or without MgCl,. The mean + s.d. of technical replicates is shown 

(n = 3).¢,Zn’' andH_' transport across vesicle membranes. Zn” * transport of 
wild-type and D436N S. sonnei ZntA proteoliposomes monitored using the 
zinc-selective chelator FluoZin-1 (left). H* counter-ion transport in wild-type 
S. sonnei ZntA or Ca**-ATPase LMCA proteoliposomes monitored using the 
pH indicator pyranine (right). 
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Extended Data Figure 5 | Structural comparison of ZntA and CopA. between Cu‘ - and Zn?*-transporting P-type ATPases (see also b). 

a, Difference between the extracellular loops of S. sonnei ZntA and b, c, Comparison of the extracellular loop lengths of ZntA (b) and CopA (c). 
L. pneumophila CopA. S. sonnei ZntA is coloured as in Fig. 1a and The lengths of the loops in S. sonnei ZntA and L. pneumophila CopA are shown, 
L. pneumophila CopA is in dark green, and the proteins have been aligned as well as averages based on 521 ZntA-type proteins and 617 CopA-type 

on helices M5 and M6. Note that the loops are substantially longer in proteins (with less than 99% and 95% sequence identity within the ZntA 


L. pneumophila CopA than in S. sonnei ZntA, which is a conserved difference and CopA sequences, respectively). 
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Extended Data Figure 6 | The phosphorylation site of S. sonnei ZntA. 
The domains are coloured as in Fig. la. AIF, /BeF3; (Al in orange, Be in green 
and F in cyan) and the Mg” ion (grey) are associated with D436 (in the 
DKTGTXT motif of the P domain) at the interface between the A and P 
domains. D436, T438, T583, D628, N631 and D632 (in the P domain), as well as 
T288, G289 and E290 (the TGE motif in the P domain that is associated with 
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dephosphorylation), are shown as sticks. Water molecules are shown as red 
spheres (not modelled for the E2P state). a, The E2P-BeF; -bound state. The 
catalytic D436 is protected from the TGE loop. b, The E2:P;-AIF, -bound 
state. E290 of the TGE loop probably activates a water molecule for 
dephosphorylation as observed in the equivalent E2°P; state of SERCA1A 
and CopA**"", 
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Extended Data Figure 7 | The extracellular pathway. a, The extracellular 
fraction of the E2-AIF, crystal structure. Functionally important residues are 
shown as sticks, and the protein is coloured as in Fig. 1a. The final 2F, — F. 
electron density is contoured at 1c. The view is equivalent to the one in Fig. 3d. 
b, Dynamics of E202 in a 60-ns molecular dynamics simulation of the E2- 
BeF; structure in a dioleoylphosphatidylcholine (DOPC) membrane in the 
absence of zinc. Selected residues are shown as sticks. Representative E202 
conformations were captured at 16, 25 and 30 ns from snapshots aligned 
according to backbone Cus of M1-M4. The orientation of E202 at 16 ns 
resembles how this side chain appears in the E2-AIF, state, while the 
flexibility observed throughout the simulation agrees with the observed poor 
electron density of the side chain in the E2-BeF; ~ state (see Fig. 3b). Note that 
there are two distorted lipids at the release pathway that may assist in Zn?* 
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release (vdW spheres represent lipid phosphates). c, Distance between the 
centre of mass of the C6 of the E202 side chain and the Nz of the K693 side 
chain during the 60-ns simulations of the E2-AIF,- and E2-BeF3_ S. sonnei 
ZntA structures in the absence of zinc, as a running average over five 
consecutive frames of each trajectory. d, The release pathway and 
accompanying protein interactions experienced by Zn** ina steered molecular 
dynamics simulation originating from the centre of mass of residues C392, 
C394 and D714. The transmembrane domain, lipid phosphates and water 
within 7 A of the protein are coloured as in b. e, The number of Zn°* protein 
interactions with a 5 A cut-off during steered molecular dynamics (SMD) 
simulations. Error bars correspond to counts from ten independent simulations 
with pulling speeds on Zn?* of 10-20 Ans 1. 
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Extended Data Figure 8 | Surface charge distribution and docking of the the entry site region of S. sonnei ZntA using electrostatic complementation 
HMBD to S. sonnei ZntA. a, Four views of the overall structure of E2-AIF,_. and van der Waals interactions, as predicted by the ClusPro 2.0 server” (b). 
The view to the left is equivalent to that in Fig. 1a. The charge distribution Equivalent view to that in a of S. sonnei ZntA without the HMBD (c). 
complies with the positive-inside rule for membrane proteins™. The putative View of the isolated HMBD, rotated 180° relative to a to show the surface 
ion entry funnel is indicated with a black arrow. b-d, Docking of the HMBD to complementary to S. sonnei ZntA (d). The ion-binding cysteine residues C15 
S. sonnei ZntA, The apo-HMBD of E. coli ZntA (PDB ID, IMWY"’) docks to and C118 are highlighted. 
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Extended Data Table 1 


Data collection, phasing and refinement statistics 


Data collection” 


E2-P; [AIF,] E2P[BeF;3 ] Se-E2:-P; [AIF, ] 

Space group C222, P2, C222, 
Cell dimensions 

a, b, c(A) 77.6 83.6 319.8 54.5 61.0 141.5 76.1 82.2 314.8 

aB,y (°) 90.0 90.0 90.0 90.0 96.0 90.0 90.0 90.0 90.0 
Resolution, (A) 50-2.70 (2.80-2.70) 50-3.20 (3.30-3.20) 50-4.50 (4.62-4.50) 
Rerse (%) 6.5 (126.7) 20.7 (115.3) 32.5 (99.1) 
I/al 15.5 (1.04) 8.07 (1.37) 8.74 (3.40) 
Completeness (%) 99.7 (93.2) 99.7 (99.8) 99.8 (99.0) 
Redundancy 4.6 (4.6) 4.6 (4.8) 7.7 (7.8) 
CC(1/2)' (%) 99.9 (73.5) 99.0 (56.8) 99.1 (84.3) 
Refinement 
Resolution (A) 50-2.70 (2.75-2.70) 50-3.20 (3.40-3.20) 
No. reflections 28862 (1294) 15448 (2437) 

Rwork/R free (Yo) 20.7/24.0 (25.9/48.7) — 21.0/28.1 (26.9/33.9) 

No. atoms 

Protein 4448 4347 

Ligand/ion 6 5 

Water 56 0 
B-factor 

Protein 77.9 92.4 

Ligand/ion 46.3 76.4 

Water 55:5 
R.m.s deviations 

Bond lengths (A) 0.005 0.004 

Bond angles (°) 0.932 0.841 


*The highest resolution shell is shown in parenthesis. +CC1,2 values were calculated using the program XDS. 
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Extended Data Table 2 | Statistical analysis of the ion entry region of S. sonnei ZntA 


2 Number of Asp + Glu in total Number of sequences Amino acid position Number of Asp or Glu Amino acid in SsZntA 


0 182 264 A 
183 227 
184 144 E 
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c Number of Asp + Glu in total Number of sequences Amino acid position Number of Asp or Glu Amino acid in LpCopA 


617 153 A 
M 


; 0 154 
2 
3 

All 


0 155 


a, Conservation of the electronegative ion entry funnel in ZntA. The negative charges are provided in three blocks of surface-exposed residues in helices M1 (182, 183, 184), M2 (210, 211, 214, 215) and M3 (345, 
347, 348), in the vicinity of the negatively charged entry funnel of S. sonnei ZntA. Five hundred and twenty-one ZntA-type proteins with less than 99% sequence identity, selected from the latest UniProt database, 
were used for the analysis. b, The number of positively charged residues in the MB’ helix of ZntA proteins using the same data set as in a. c, Conservation of the CopA region equivalent to the electronegative ion entry 
funnel in ZntA. The number of negatively charged residues in the MB’ helix of CopA proteins. Six hundred and seventeen CopA-type proteins with less than 95% sequence identity, selected from the latest UniProt 
database, were used for the analysis. (See also Fig. 3e, f.) 
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CAREER CHANGES 


Open for business 


Amaster’s in business can offer scientists extra flexibility 


or whole new career paths. 
BY KAREN RAVN 


from Stanford University in California 
with a PhD in chemical engineering. A few 
months later, she had invented a coating that 
makes contact lenses more comfortable and had 
co-founded a company to sell it. 
She soon discovered that starting a business is 
tough. “In the process of taking science from the 


lE June 2011, Karen Havenstrite graduated 


lab into the marketplace, I realized how much I 
didn't know,’ she says. She is learning it all now. 
In 2015, she will graduate from Stanford again 
— this time with a master’s in business admin- 
istration (MBA). At about the same time, her 
company — Ocular Dynamics in Menlo Park, 
California — will begin to market her invention. 

Data from the US National Center for Educa- 
tion Statistics in Washington DC suggest that 
Havenstrite’s MBA will be one of about 194,000 


advanced business degrees awarded next June 
in the United States alone. Scientists make up 
a minority of MBA enrolment, but the degree 
is something that researchers should consider: 
it could facilitate advancement in their existing 
careers, or open up prospects by helping them 
to turn an idea into a business plan and then into 
a profitable venture (see ‘Start-ups’). 

Many working scientists feel that they have 
already racked up enough years — and debt — 
getting PhDs. Yet for some, ‘B-school might be 
just the ticket, and there are ways to mitigate 
what can be formidable costs. Some employ- 
ers will pay the tuition fees and promote the 
researchers when they complete the degree. 
And weekend classes offer a way to keep full- 
time jobs while completing the course. Time 
and money can also be saved by choosing a one- 
year programme instead of the more-traditional 
two-year stint. Online MBAs are usually the 
least expensive choice. 


POTENTIAL REWARDS 

The degree can offer returns on the investment, 
however. The Graduate Management Admis- 
sion Council, a non-profit organization based 
in Reston, Virginia, has found consistently high 
percentages of alumni who report that their 
degrees have paid offin terms of income and job 
satisfaction. Ina survey of nearly 21,000 alumni 
of 132 business schools around the world, all 
graduating between 1959 and 2013, most said 
that their education had been rewarding per- 
sonally (94%), professionally (90%) and finan- 
cially (77%). 

There is no dearth of programmes to choose 
from — 13,000 worldwide, according to the 
Association to Advance Collegiate Schools of 
Business in Tampa, Florida. And for vetting, an 
abundance of organizations and publications 
rank programmes according to various criteria, 
including cost, peer and recruiter assessment, 
starting salary and bonus, employment rates 
and test scores. The location of the university 
and the length of the programme are likely to 
be crucial in making the choice. 

The MBA degree was created in the United 
States in 1908, and US universities have tended 
to dominate both in enrolments and in rank- 
ings. But institutions such as the London Busi- 
ness School; INSEAD in France, Singapore and 
Abu Dhabi; and the IE Business School in Spain 
have also found their ways onto some elite lists. 
People wanting a one-year programme are likely 
to have better luck looking in Europe, because 
shorter programmes are more popular there 
than in the United States. > 
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START-UPS 


Entrée to entrepreneurship 


Some of the biggest companies in the 
world — Amazon, Apple, Google — are 
famous for being born in garages. As yet, 
they are unrivalled by any of the fledgling 
companies born in Stanford University’s 
Startup Garage in California. But even though 
the short course has been running for only 
two years, several businesses it has spawned 
are thriving, and a couple founded by 
scientists should be under way soon. Enara 
Health in San Mateo, California, will launch 
early next year and uses mobile technology 
to deliver ancillary health care interactively 
with the goal of improving access and 
follow-up for obesity and related conditions. 
“Students begin the course with a need 


> A one-year programme is often the best 
option for people who want to accelerate their 
careers, rather than switch them, says Douglas 
Stayman, former associate dean of MBA pro- 
grammes at Cornell University’s Samuel Curtis 
Johnson Graduate School of Management in 
Ithaca, New York, which offers both one- and 
two-year programmes. Stayman is now over- 
seeing the MBA programme at Cornell Tech, 
which in 2017 will move to a new campus in 
New York City. 

The IE Business School is one of a few high- 
rated schools to offer online MBAs. The online 
degree has the same admission requirements as 
the full-time on-campus programme (online 
students tend to be a few years older and have a 
couple more years of work experience). 

Earlier this year, the Graduate Manage- 
ment Admission Council surveyed more than 
3,000 students in their final year of business 
school at 111 universities around the world and 
found that nearly 60% already had job offers. 
About one-quarter of the offers were in finance 
and accounting, one-fifth each in consulting 
and products and services (such as supply-chain 
management, or getting goods to market) and 
15% in technology. Only 5% of the offers were 
in health care or pharmaceutical drug develop- 
ment — prime possibilities for scientists — but 
students seeking jobs in those areas were among 
the most likely to get offers. Many schools say 
that 80-90% of their students will be employed 
by three months after their graduation. 

“An MBA is the only advanced degree I know 
of that expands your opportunities rather than 
shrinks them,” says Dan Madden, senior man- 
ager for strategic planning at Zoetis, an animal- 
health company in Florham Park, New Jersey. 
Before earning his MBA, Madden wasa chemist 
for Schering-Plough (now part of Merck). 

Costs vary from university to university and 
from nation to nation. In 2013, the US business 
and technology news website Business Insider 


and a user in mind,” says coordinator Ryann 
Price. In teams of 2-4, students design 
products, make prototypes, create business 
models, test hypotheses and seek funding. 
“We encourage them to build and test 
simply and cheaply,’ says Price, who reports 
that students generally find the process 
more difficult than they thought. 

This autumn, Stanford is partnering with 
Peking University in Beijing, China, to offer 
The Startup Garage: The China Version. 
One team in that course is trying to adapt 
23andMe — a company in Mountain View, 
California, that provides customers with 
ancestry-related genetic reports — for the 
China market. K.R. 


compared the costs for the first year ofan MBA 
programme at 11 US schools. Tuition alone 
ranged from about US$53,000 to $65,000, but 
total costs, including room and board, insur- 
ance, supplies and miscellaneous fees ranged 
from about $81,000 to nearly $100,000. 

At the London Business School, tuition 
— including reading materials but no other 
expenses — was £64,200 (US$102,000) for 
the 15-21 month programme that started in 
August. At INSEAD, tuition fees for the class 
graduating in 2015 were €62,500 ($79,000), but 
mandatory health-insurance and administra- 
tion fees add another €800, and living expenses 
are estimated at €22,300 in Singapore and at 
€23,600 in France. 

Sometimes, an MBA will expand a scientist's 
career opportunities so much that the science 
aspect loses much of its attraction. That hap- 
pened for Yasar Awan. Now a manager at Ray- 
theon in Boston, Massachusetts, overseeing the 
flow of materials from circuit boards to wires 
in support of a US Navy programme, Awan is 


a 2014 graduate of a five-year programme at 
Pennsylvania State University’s Smeal College 
of Business in University Park in which students 
earn a bachelor of science — his was in biol- 
ogy — and an MBA (see ‘Highlights’). Along 
the way, Awan discovered possibilities that he 
might never have seen. “I don’t really want to go 
into science any more,’ he admits. “I fell in love 
with supply chain” 


VALUABLE SKILLS 

Scientists who get MBAs usually continue to 
think of themselves as scientists at some level, 
at least. And they are generally more comfort- 
able explaining how a new product or technol- 
ogy works — whether to fellow employees or 
to external clients — than are non-scientists, 
says Mark Pauly, a health-care-management 
researcher at the University of Pennsylvania's 
Wharton School in Philadelphia. 

In fact, “there are certain roles where we 
really request a science background’, says Beth 
Keeler, vice-president for global acquisition and 
career planning at Pfizer in New York. “It’s not 
required, but many times it’s preferred.” 

Also, scientists are generally more analytical 
than the average business-school student, says 
finance manager Ian McFetridge, also at Pfizer. 
“They're good at breaking down problems into 
smaller pieces, good at seeing how different fac- 
tors are related.” On the other hand, he adds, 
“a lot of people coming out of science are not 
as natural at communicating and presenting 
ideas. But when it does happen that a scientist 
can communicate well, then you have someone 
who can take a complicated subject and sim- 
plify it for the audience, and that’s exceptional” 
Such pairing of communication and science is 
particularly useful in consulting work, when 
MBA graduates need to explain to their clients 
how their suggestions for boosting revenue or 
efficiency will work. 

As a research scientist in New York City, 
Brandan Hillerich developed new biotechnolo- 
gies. He loved his work. But filing for patents 
and coming up with commercial strategies to 


HIGHLIGHTS 


Some MBA programmes of interest to scientists 


Two US MBA programmes were designed 
especially with scientists in mind. The 
one-year programme at Cornell University’s 
Samuel Curtis Johnson Graduate School of 
Management in Ithaca, New York, is intended 
for those who have already spent a good deal 
of time and money getting advanced degrees 
but now want to expand their expertise into 
business. Students in this programme miss 
out on the internships that two-year students 
participate in during the summer between 
their first and second years, but they have 
the extra bonus of an intensive summer term 
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before the regular school year begins. 

The Smeal College of Business at 
Pennsylvania State University in University 
Park offers a small selective programme 
that allows students to earn both a bachelor 
of science and an MBA in five years. 
Students satisfy the requirements for their 
undergraduate degrees in the first three 
years, and during the next two years they 
take the regular MBA programme with 
other MBA students. (If students so choose, 
they can take six years to complete the 
programme.) K.R. 


market those technologies presented a whole 
new challenge. “To me, this was even more 
exciting than the bench science,’ he says. And 
he wanted more of it. Soin May, he began a one- 
year MBA programme at Cornell. 

Some find business more exciting than sci- 
ence just because things can happen faster. In 
2004, Ana Albir graduated from the Massachu- 
setts Institute of Technology in Cambridge with 
a major in physics. But the research experience 
she had accumulated as an undergraduate led 
her to decide that a career in physics was not for 
her. “It can take decades to see results,” she says. 
In 2009, she graduated from Stanford with an 
MBA and she is now chief executive of Moon- 
drop Entertainment, a company she founded in 
2012 to create educational tablet apps for chil- 
dren. She still works on challenging problems, 
but now she can solve them in days or weeks 
simply by designing a clever piece of software. 
“With physics, I love the field,’ she explains, “but 
in what I do now, I love the field and the pace” 

Business requires as much creativity as sci- 
ence, say many PhD graduates who are pursu- 
ing MBAs. But advanced science degrees tend 
to be more of an individual pursuit, whereas 
business qualifications usually involve working 
as part of a team. “T like the collaboration,’ says 
Drew Rattigan, a second-year MBA candidate at 
Smeal. “T like a more social environment.” 

Teamwork isa big plus for Hillerich too. After 
he had been in the programme at Cornell for 
less than a month, he knew all his classmates 
— at around 100, at least during the summer, a 
much larger number than ina PhD programme. 
Working in groups with them is great, he says. 
“Everyone thinks about things so differently. It 
expands your own thinking” 

An MBA was always on the cards for Ally 
Chang, who received a PhD in biomedical sci- 
ence from the University of Auckland in New 
Zealand in 2009 and an MBA from Cornell in 
2011 and is now the new-products commercial 
manager at Corning Life Sciences in Tewks- 
bury, Massachusetts. Her science background 
is a big asset when she works with researchers to 
decide whether their ideas will work in the mar- 
ketplace. “I’ve heard the comment — so many 
PhDs, so few professorships,’ Chang says. “But 
even before I started my PhD, I knew I didn’t 
want to be a professor. I wanted to do what ’'m 
doing.’ Chang has always had a passion for sci- 
ence — but she wants to turn her scientific ideas 
into commercial products. 

Havenstrite and Hillerich, too, have chosen 
business for business's sake, because they believe 
it is a way to have a direct, positive impact on 
peoples lives. “That is why I got into science in 
the first place,” Hillerich says. It is a sentiment 
expressed by many scientists who have, or are 
seeking, MBAs: they want to do work that has 
tangible, measurable effects, and soon, not in 
some abstract, distant future. m 


Karen Ravn is a freelance writer in Pacific 
Grove, California. 


TURNING POINT 


CAREERS 


Andrew Dove 


Andrew Dove was named the 2014 Royal 
Society of Chemistry Gibson-Fawcett 
Award winner in May. A chemist at the 
University of Warwick, UK, Dove describes 
his circuitous path into research on 
biodegradable materials for regenerative 
medicine, which involves replacing or 
regenerating human tissue. 


What area of chemistry first drew you in? 

Ina word, catalysis — designing inorganic 
catalysts that boost the efficiency or change 
the chemical properties of large polymers 
known as plastics. After working at BP 
Chemicals in Saltend, UK, during my fourth 
year of university, I thought I wanted to work 
on industry-sponsored projects — for exam- 
ple, using these catalysts to make polyethyl- 
ene, a chemically resistant plastic. 


Why did you initially focus on industry? 

It was probably my dad’s influence. Aca- 
demia was not on my list of potential careers. 
But I came to realize that I really enjoyed 
basic research and wanted to give it a go. I 
applied for a PhD at Imperial College Lon- 
don, where my adviser offered me a project 
making polylactide, which is now the most 
widely used biopolymer around, particularly 
in biomedical applications. Now that it can 
be made from corn, rather than from petro- 
chemicals, it is cheaper to use in the face of 
rising oil prices. 


How was your postdoc a turning point? 

My wife and I moved to the United States to 
pursue postdoc positions in a bid to build 
up our CVs. I was at Stanford University in 
California for about 15 months working on 
inorganic catalysis. Then my funding ran 
out. But my wife still had her postdoc fund- 
ing to work at IBM, and I was able to get a 
postdoc contract there too, in the company’s 
Center on Polymer Interfaces and Macromo- 
lecular Assemblies, which is funded by the 
US National Science Foundation. There, I 
started doing more organic catalysis. I had 
freedom to do whatever I wanted as long as 
good, publishable science was the result. It 
was a breakthrough period because it helped 
me to believe that I had good ideas and could 
translate them into interesting projects. 


Where did your research go from there? 

I should credit the American Chemical 
Society with my change in direction. The 
inorganic chemistry and polymer talks in 
their meetings were always at opposite ends of 


u 


the conference centre when I attended them 
in 2003 and 2004 — so I had to choose which 
I found more interesting, and polymers won. 
Those meetings proved crucial for helping 
me to understand where the cutting edge for 
creating new polymer materials really was at 
the time. I saw a couple of opportunities. One 
was to find ways to give biodegradable materi- 
als different physical properties and use those 
materials in high-value applications such as in 
biomedical devices. 


How did you approach your job search? 

I applied for jobs probably even before I 
was ready for them, and found that it really 
helped me to hone my research proposals. In 
2004, I started applying for academic posts. 
Rather than looking for jobs with ‘inorganic’ 
in the advert, I applied for a UK fellowship 
in nanoscience, and persuaded the univer- 
sity that I had the skills for the job. In 2005, 
I started my own group at the University of 
Warwick. 


What are you working on at the moment? 

I’m working on degradable polymers. One 
is a hydrogel material that could one day 
be combined with adult stem cells to make 
a scaffold able to regenerate a human spinal 
disc. Once the cells start to grow, the biologi- 
cal material would take over, leaving nothing 
synthetic in the body. 


How do you feel about media observations 
that Royal Society of Chemistry award 
winners often become Nobel laureates? 

I find it quite amusing. It would be lovely if 
that happened, but I think the press made a 
bit of a leap. I don't feel daunted by it because 
I don't take it seriously. = 


INTERVIEW BY VIRGINIA GEWIN 
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Ua SCIENCE FICTION 


GOOGLE 


CAR TAKES THE TEST 


BY NORMAN SPINRAD 


hadya mean a ticket? Come on, 
Tmacop too, ain't I? Well almost, 
so I’m only a driving-test officer, 


but I got the cap and the uniform, don't I? 
And I wasn't driving this damn thing, now, 
was I? 

Just my job, pal. Not my idea that Google 
Cars should have to have driving licences to 
hit the road like you and me. Good or bad 
idea, it’s the law, and we didn't make it, did 
we? My job is to test the thing, pass or 
fail, and this sucker didn't, and your 
job is to write a ticket, or yeah, in 
this case, a whole padfull, if a 
driver plays wise guy with the 
rules of the road. 

But you can’t ticket 
me, I wasn't driving, 
not for a minute. 
Take a good look, 
pal. No steering 
wheel, no brake 
pedal, no accelerator, 
no speedometer, no noth- 
ing. Yeah, you gotta write up 
a tonne of tickets, but you gotta 
ticket Google. 

I wasn't exactly against these driverless 
cars before, in my line of duty, you spend your 
working day being driven around by so-called 
human drivers half of which just manage to 
pass the road test and the other half who don't 
but usually don't quite get you killed. 

Which is why we got a reputation for look- 
ing for ways to fail you, especially towards 
the end of the shift, I mean, officer, wouldn't 
you? And seeing as how I see myself as a 
guardian of the safety of the road from what 
shouldn't be allowed to barrel along down 
it, ’m not against making these robots pass 
the test too. 

Okay, I gotta admit I had it in for the thing, 
especially when it’s last on the line at the end 
of the day, looks like one of those cartoon 
cars out of that Disney movie, what can I tell 
you, I got three kids, and I just can't take any 
more cutsy-poo than I have to, can you? 

It's passed the written test or it wouldn't be 
on the line, how could it not since it’s really 
Google, and Google knows everything. 

It works on voice command, so I tell it 

to open up and it lets 


D> NATURE.COM me in. I tell it to close, 
Follow Futures: but it won't do that 
© @NatureFutures until I buckle in, fair 
Ei go.naturecom/mtoodn enough. Not every 


The drive of your life. 


test car comes equipped with dual steering 
wheels and peddles, it’s unfortunately not 
mandatory, and you get used to it. But when 
it gets through to me that this thing doesn’t 
have any manual controls at all I gotta admit 
it freaks me out a bit. However, it’s the end of 
the shift, and the job is the job, what are you 
gonna do, so... 


I tell it to pull away from the kerb, which 
it does, but it immediately loses points for 
not looking back over its shoulder and hand- 
signalling, which, having no hands, head or 
shoulder, it can't, but those are the rules, 
dumb as they may be in modern times. 

Keeps good distance front and back in the 
straightaways, doesnt try to race through 
yellow lights, turns right from the right lane 
and left from the left and all, maintains speed 
limits exactly like a prissy sissy, which is what 
you're supposed to do on the test. But keeps 
losing points for not hand-signalling so I've 
got plenty enough to fail it with even before 
the parking test. 

So I pick a space between a Toyota and a 
VW with maybe six inches front and back to 
squeeze into, this I gotta see... 

“Pull over and parkhere,”’ 

Which, like magic, it does! 

Only it cuts offa Hell’s Angel on his Harley 
doing it, and this dude is not amused, starts 
dismounting with his engine still running, 
blood in his eyes, and a big monkey wrench 
in his hand. 

“Get us out of here right now!” 

It pulls out sideways, but just sits there. 
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“Go straight!” 

It does. But the Harley is following us 
and the Google Car keeps to the speed limit 
like a good little citizen, which, as you gotta 
know, officer, is not a constraint on the Hell’s 
Angel. 

“Dont let that bike catch up with us!” 

Nothing happens. 

“Uh... emergency override! Do what it 
takes!” 

Well the Google Car does. 
But within the letter of the 
law. Doesn't break the 
speed limit. Doesn't make 
illegal turns. What it does is 
find itself one-way 
street jammed with 
crawling early rush- 
hour traffic, no 
sweat for Google GPS in 
LA, and it bobs and weaves 
through it at the exact speed 
limit with inches to spare, to 
the point where I gotta close 
my eyes to keep from having a 

heart attack. 

When I open them, the Harley is 
nowhere in sight, but the Google Car is 
still doing it — up a freeway on-ramp! 

Should have told it what? In the middle 
of a crowded freeway on-ramp? Stop? Turn 
back? Wait your turn? 

What would you have said, wise guy? 

Whatsa matter, officer, cat got your tongue? 

Okay, okay, I shouldn't have said what 
I did, and if I hadn't maybe, just maybe, I 
wouldn't have found myself trapped in a so- 
called smartcar jigging and jagging at the 
legal speed limit through three lanes of traf- 
fic going 10 m.p.h. faster through the desert 
with a dozen Highway Patrol cars tailing it 
and trying to figure out what in hell to do 
about it until its battery finally ran down. 

I didn’t even realize it wasn't just inside 
my head when I screamed it. In fact, I think 
the first part was just in my head... Google, 
Schmoogle, these things... 

What did I tell the Google Car out loud? 
Tell me you wouldn't have said it, officer! 

“Drive like a drunken bank robber with a 
sack full of hundred dollar bills on his way 
to Las Vegas!” = 


Norman Spinrad has now been publishing 
novels in English for an actual half-century. His 
latest publication in English is the pamphlet 
Raising Hell. His latest novel has just been 
published in Frenchas Police Du Peuple. 
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